Multiple issues #47

prasang-gupta · 2022-07-02T16:00:41Z

Hi,

I recently used this for a project and faced several issues/shortcomings of the repository that can be improved. Here are some of the things:

Ability to auto-populate task fields based on labels found in the dataset and removal of the restriction of the task argument to be one of the standard ones

    parser.add_argument('--task', type=str, default=None, required=True, choices=['ace04', 'ace05', 'scierc'])

Having a --no-cuda argument in run_entity as well to allow users to test code on a machine without CUDA without actually interfering with the code. This is already present in run_relations, but has been skipped in this file for some reason.
Lack of --do_predict, which has already been raised in How to run a pretrained model on unlabeled data? #30
Some optimisations like

    model = EntityModel(args, num_ner_labels=num_ner_labels)

    dev_data = Dataset(args.dev_data)
    dev_samples, dev_ner = convert_dataset_to_samples(dev_data, args.max_span_length, ner_label2id=ner_label2id, context_window=args.context_window)
    dev_batches = batchify(dev_samples, args.eval_batch_size)

    if args.do_train:
        train_data = Dataset(args.train_data)

All of these lines can be put under the if args.do_train as these are relevant only to that section and the other args.do_eval section already loads the model again like so

    if args.do_eval:
        args.bert_model_dir = args.output_dir
        model = EntityModel(args, num_ner_labels=num_ner_labels)

Other small optimisations and some typos

I am thinking of creating an independent issue for each of these and start working on including some of these fixes/features in the near future as I've already made quite a few changes locally for my project. Just wanted to get your thoughts on these issues and if there is something that I am missing.

The text was updated successfully, but these errors were encountered:

a3616001 · 2022-07-05T17:09:21Z

Hi, thanks for using our code and raising these issues. All the fixes/optimizations look good to me! You may create individual pull requests for them.
One thing to note is that please make sure the modifications are compatible with the current code, because I do think it is easy for people to reproduce the results in the paper. (e.g., you may add an option for --task to enable automatically collecting labels, and still support --task ace05)

prasang-gupta · 2022-07-06T07:59:01Z

Cool .. will ensure backwards compatibility.

prasang-gupta closed this as completed Jul 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple issues #47

Multiple issues #47

prasang-gupta commented Jul 2, 2022

a3616001 commented Jul 5, 2022

prasang-gupta commented Jul 6, 2022

Multiple issues #47

Multiple issues #47

Comments

prasang-gupta commented Jul 2, 2022

a3616001 commented Jul 5, 2022

prasang-gupta commented Jul 6, 2022