Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple issues #47

Closed
prasang-gupta opened this issue Jul 2, 2022 · 2 comments
Closed

Multiple issues #47

prasang-gupta opened this issue Jul 2, 2022 · 2 comments

Comments

@prasang-gupta
Copy link

Hi,

I recently used this for a project and faced several issues/shortcomings of the repository that can be improved. Here are some of the things:

  • Ability to auto-populate task fields based on labels found in the dataset and removal of the restriction of the task argument to be one of the standard ones
    parser.add_argument('--task', type=str, default=None, required=True, choices=['ace04', 'ace05', 'scierc'])
  • Having a --no-cuda argument in run_entity as well to allow users to test code on a machine without CUDA without actually interfering with the code. This is already present in run_relations, but has been skipped in this file for some reason.
  • Lack of --do_predict, which has already been raised in How to run a pretrained model on unlabeled data? #30
  • Some optimisations like
    model = EntityModel(args, num_ner_labels=num_ner_labels)

    dev_data = Dataset(args.dev_data)
    dev_samples, dev_ner = convert_dataset_to_samples(dev_data, args.max_span_length, ner_label2id=ner_label2id, context_window=args.context_window)
    dev_batches = batchify(dev_samples, args.eval_batch_size)

    if args.do_train:
        train_data = Dataset(args.train_data)

All of these lines can be put under the if args.do_train as these are relevant only to that section and the other args.do_eval section already loads the model again like so

    if args.do_eval:
        args.bert_model_dir = args.output_dir
        model = EntityModel(args, num_ner_labels=num_ner_labels)
  • Other small optimisations and some typos

I am thinking of creating an independent issue for each of these and start working on including some of these fixes/features in the near future as I've already made quite a few changes locally for my project. Just wanted to get your thoughts on these issues and if there is something that I am missing.

@a3616001
Copy link
Member

a3616001 commented Jul 5, 2022

Hi, thanks for using our code and raising these issues. All the fixes/optimizations look good to me! You may create individual pull requests for them.
One thing to note is that please make sure the modifications are compatible with the current code, because I do think it is easy for people to reproduce the results in the paper. (e.g., you may add an option for --task to enable automatically collecting labels, and still support --task ace05)

@prasang-gupta
Copy link
Author

Cool .. will ensure backwards compatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants