Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable pylint in the github workflow #63

Merged
merged 4 commits into from
Mar 6, 2024
Merged

Enable pylint in the github workflow #63

merged 4 commits into from
Mar 6, 2024

Conversation

tedhtchang
Copy link
Collaborator

@tedhtchang tedhtchang commented Feb 27, 2024

Changes

  1. Add pylint to workflow
  2. Fix existing pylint errors

Closes #59
Test locally:
git clone https://github.com/tedhtchang/fms-hf-tuning -b Enable-pylint

cd fms-hf-tuning
pip install -r setup_requirement.txt
tox run -e lint

@tedhtchang
Copy link
Collaborator Author

@hickeyma What's your recommendation for the following warnings/convention? They seem reasonable to me. Do we have to correct every one?

....
************* Module tuning.sft_trainer
tuning/sft_trainer.py:84:13: W1514: Using open without explicitly specifying an encoding (unspecified-encoding)
tuning/sft_trainer.py:171:4: W1203: Use lazy % or .format() formatting in logging functions (logging-fstring-interpolation)
tuning/sft_trainer.py:173:8: W1203: Use lazy % or .format() formatting in logging functions (logging-fstring-interpolation)
tuning/sft_trainer.py:208:21: C3001: Lambda expression assigned to a variable. Define a function using the "def" keyword instead. (unnecessary-lambda-assignment)
tuning/sft_trainer.py:215:4: W1203: Use lazy % or .format() formatting in logging functions (logging-fstring-interpolation)
tuning/sft_trainer.py:220:8: W1203: Use lazy % or .format() formatting in logging functions (logging-fstring-interpolation)
************* Module tuning.utils.merge_model_utils
tuning/utils/merge_model_utils.py:84:9: W1514: Using open without explicitly specifying an encoding (unspecified-encoding)
************* Module scripts.run_inference
scripts/run_inference.py:80:13: W1514: Using open without explicitly specifying an encoding (unspecified-encoding)
scripts/run_inference.py:84:13: W1514: Using open without explicitly specifying an encoding (unspecified-encoding)
scripts/run_inference.py:247:13: W1514: Using open without explicitly specifying an encoding (unspecified-encoding)
scripts/run_inference.py:260:9: W1514: Using open without explicitly specifying an encoding (unspecified-encoding)

....
Messages
--------
+------------------------------+------------+
|message id                    |occurrences |
+==============================+============+
|unspecified-encoding          |6           |
+------------------------------+------------+
|logging-fstring-interpolation |4           |
+------------------------------+------------+
|unnecessary-lambda-assignment |1           |
+------------------------------+------------+
-----------------------------------
Your code has been rated at 9.69/10
lint: exit 20 (15.69 seconds) /home/runner/work/fms-hf-tuning/fms-hf-tuning> pylint tuning 'scripts/*.py' pid=2242
.pkg: _exit> python /opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
  lint: FAIL code 20 (139.39=setup[123.70]+cmd[15.69] seconds)
  evaluation failed :( (139.45 seconds)
Error: Process completed with exit code 20.

@jbusche
Copy link
Collaborator

jbusche commented Feb 28, 2024

Nice @tedhtchang, just tested and I'm seeing the same lint results as you. They're warnings and convention errors, so I would imagine they're not critical at the moment but I suspect 3 issues could be created for each type and they could get addressed later as time permits...
W1514
W1203
C3001

@Ssukriti
Copy link
Collaborator

@tedhtchang one change I would suggest here is if we can update CI/CD checks to fail only if pylint score < 9.0 ?

As you can see the check is failing on your PR . If checks fail, we will block the PR . Expecting 10 /10 may be unreasonable at the moment

@tedhtchang
Copy link
Collaborator Author

tedhtchang commented Feb 29, 2024

Sure. I disabled the 3 checkers to pass the lint for 10/10 but covered up these warning messages is not ideal.
I like score <9 threshold idea.

@jbusche
Copy link
Collaborator

jbusche commented Feb 29, 2024

That worked well @tedhtchang

Your code has been rated at 9.69/10

.pkg: _exit> python /root/.env/lib64/python3.11/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
  lint: OK (837.33=setup[810.68]+cmd[26.64] seconds)
  congratulations :) (837.42 seconds)

@hickeyma
Copy link
Collaborator

one change I would suggest here is if we can update CI/CD checks to fail only if pylint score < 9.0 ?

This is not good practise. I recommend checking out the pylint code first (search: pylint W1203 for example) and if the change recommended doesn't suit for a good technical reason then disable using the comment: # pylint: disable=

@hickeyma
Copy link
Collaborator

They seem reasonable to me. Do we have to correct every one?

@tedhtchang Maybe you can't resolve everyone of the warnings/errors but need to decide for a good reason and disable as shown in previous comment above.

@tedhtchang
Copy link
Collaborator Author

The black formatter doesn't like those# pylint: disable=logging-fstring-interpolation or # pylint: disable=unspecified-encoding comments because they make lines longer than 100 columns :(

I ended up fixed almost every warning except :

        format_dataset = lambda example: {  # pylint: disable=unnecessary-lambda-assignment
        f"{data_args.dataset_text_field}": example[f"{data_args.dataset_text_field}"]
        + tokenizer.eos_token
    }

and

- https://huggingface.co/docs/peft/main/en/package_reference/lora#peft.LoraModel.add_weighted_adapter # pylint: disable=line-too-long

@Ssukriti We should to manually run functional test that can exercise these modules:
sft_trainer.py, run_inference.py, merge_model_utils.py

@tedhtchang
Copy link
Collaborator Author

tedhtchang commented Mar 1, 2024

I used this command to see if sft_trainer module is till functional. Do we have any other functional test to verify this module ?

export CUDA_VISIBLE_DEVICES=0
export WORLD_SIZE=0
export MODEL_PATH="bigscience/bloomz-560m"
export DATA_PATH="twitter_complaints.json"
export OUTPUT_PATH="/tmp/fms-hf-tuning/output"
python -m tuning.sft_trainer  \
--model_name_or_path $MODEL_PATH  \
--data_path $DATA_PATH  \
--output_dir $OUTPUT_PATH  \
--num_train_epochs 1  \
--per_device_train_batch_size 4  \
--per_device_eval_batch_size 4  \
--gradient_accumulation_steps 4  \
--evaluation_strategy "no"  \
--save_strategy "epoch"  \
--learning_rate 1e-5  \
--weight_decay 0.0  \
--lr_scheduler_type "cosine"  \
--logging_steps 1  \
--packing false  \
--include_tokens_per_second true \
--response_template "\n### Response:"  \
--dataset_text_field "output" \
--tokenizer_name_or_path "bigscience/tokenizer" \
--use_flash_attn false \
--torch_dtype "float32" \
--peft_method "pt"

tuning/sft_trainer.py Outdated Show resolved Hide resolved
@Ssukriti
Copy link
Collaborator

Ssukriti commented Mar 5, 2024

@tedhtchang @hickeyma we have seen those conflicts before pylint and black even before and we have not been able to mandate a 10 pylint score. I do not want to slow down contributions getting stuck in these kind of conflicts and have to figure out how to disable pylint.
There are lot of instances online that mention problems with pylint https://pythonspeed.com/articles/pylint/ (see how to use pylint section).

  1. since we want to use pylint only to catch production errors, we can either enable only few checks as suggested in the article

  2. Another way to do it is, since Ted went over the changes and fixed everything he could - we cap CI/CD check to fail only if score below whatever score it is at now, to catch only new errors going forward. If score needs to be further reduced in future, we will reduce it. We do not want that CI/CD to always fail with a score of 10, otherwise no use having that CI/CD check

    2.a If we are going this route and asking users to fix or disable pylint checks as needed, please document how other contributors can do the same in contributions.md. This is a very fast moving team and we need to release features every two weeks, I cannot slow down PRs by a few days because contributors cant pass the pylint check

@Ssukriti
Copy link
Collaborator

Ssukriti commented Mar 5, 2024

for functional testing if you ran command in your comment #63 (comment) , that should be sufficient.
we will do further testing on our side, @anhuong will run one test to make sure tuning still works as part of review

@anhuong
Copy link
Collaborator

anhuong commented Mar 5, 2024

@tedhtchang the above command worked fine for me. After checking out your PR branch and installing it:

$ export CUDA_VISIBLE_DEVICES=0
$ export WORLD_SIZE=0
$ export MODEL_PATH="bigscience/bloomz-560m"
$ export DATA_PATH="/dccstor/anhtest/twitter_complaints.json"
$ export OUTPUT_PATH="/tmp/fms-hf-tuning/output"
$ python -m tuning.sft_trainer  \
> --model_name_or_path $MODEL_PATH  \
> --data_path $DATA_PATH  \
> --output_dir $OUTPUT_PATH  \
> --num_train_epochs 1  \
> --per_device_train_batch_size 4  \
> --per_device_eval_batch_size 4  \
> --gradient_accumulation_steps 4  \
> --evaluation_strategy "no"  \
> --save_strategy "epoch"  \
> --learning_rate 1e-5  \
> --weight_decay 0.0  \
> --lr_scheduler_type "cosine"  \
> --logging_steps 1  \
> --packing false  \
> --include_tokens_per_second true \
> --response_template "\n### Response:"  \
> --dataset_text_field "output" \
> --tokenizer_name_or_path "bigscience/tokenizer" \
> --use_flash_attn false \
> --torch_dtype "float32" \
> --peft_method "pt"

 UserWarning: Can't initialize NVML
  warnings.warn("Can't initialize NVML")
Map: 100%|████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 547.45 examples/s]
tokenizer_config.json: 100%|██████████████████████████████████████████████| 227/227 [00:00<00:00, 140kB/s]
tokenizer.json: 100%|████████████████████████████████████████████████| 14.5M/14.5M [00:00<00:00, 85.2MB/s]
special_tokens_map.json: 100%|█████████████████████████████████████████| 85.0/85.0 [00:00<00:00, 35.5kB/s]
Map: 100%|███████████████████████████████████████████████████████| 50/50 [00:00<00:00, 4051.22 examples/s]
/dccstor/anhtest/fms-hf-tuning/venv/lib/python3.9/site-packages/trl/trainer/sft_trainer.py:294: UserWarning: You passed a tokenizer with `padding_side` not equal to `right` to the SFTTrainer. This might lead to some unexpected behaviour due to overflow issues when training a model in half-precision. You might consider adding `tokenizer.padding_side = 'right'` to your code.
  warnings.warn(
Detected kernel version 4.18.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
You're using a BloomTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
/lib/python3.9/site-packages/trl/trainer/utils.py:135: UserWarning: Could not find response key `[66673, 29]` in the following instance: <pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad>### Text: The wait is finally over, just joined @SquareUK, hope to get started real soon!

### Label: no complaint</s> This instance will be ignored in loss calculation. Note, if this happens often, consider increasing the `max_seq_length`.
....
....
{'train_runtime': 411.0194, 'train_samples_per_second': 0.122, 'train_steps_per_second': 0.007, 'train_tokens_per_second': 5.421, 'train_loss': 0.0, 'epoch': 0.92}
100%|██████████████████████████████████████████████████████████████████████| 3/3 [06:50<00:00, 136.93s/it]

Not sure why it didn't work for you. Also worked with llama-7b model

@tedhtchang
Copy link
Collaborator Author

tedhtchang commented Mar 5, 2024

@tedhtchang the above command worked fine for me. After checking out your PR branch and installing it:
...
Not sure why it didn't work for you. Also worked with llama-7b model

@anhuong thanks for review. I think it failed to run with transformers==4.38.2 package installed. I had to downgrade to 4.37.2

@tedhtchang
Copy link
Collaborator Author

tedhtchang commented Mar 5, 2024

@hickeyma 's Recommendation is contributors should fix, disable(in-line), disable globally in that order.

Copy link
Collaborator

@anhuong anhuong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tedhtchang Do you plan to add details on how to run and address errors from the linter in a later PR after Contributions.md has been added?

@@ -443,7 +443,9 @@ disable=raw-checker-failed,
attribute-defined-outside-init,
abstract-method,
pointless-statement,
wrong-import-order
wrong-import-order,
duplicate-code,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there duplicate code in the launch_training that wasn't worth fixing? Seems like a good check to have instead of globally removing

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anhuong What do you think? I can revert this change and fix launch_training.py in another PR properly.
It looked a little messy because I had to put many #pylint: disable=duplicate-code in those 2 files. I could fix but then I have to figure out how to test it which could further delay this PR.

build/launch_training.py:1:0: R0801: Similar lines in 2 files
==build.launch_training:[65:74]
==tuning.sft_trainer:[277:286]
    parser = transformers.HfArgumentParser(
        dataclass_types=(
            configs.ModelArguments,
            configs.DataArguments,
            configs.TrainingArguments,
            peft_config.LoraConfig,
            peft_config.PromptTuningConfig,
        )
    ) (duplicate-code)
build/launch_training.py:1:0: R0801: Similar lines in 2 files
==build.launch_training:[80:86]
==tuning.sft_trainer:[292:298]
        (
            model_args,
            data_args,
            training_args,
            lora_config,
            prompt_tuning_config, (duplicate-code)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true I duplicated the code because passing the arguments to sft_trainer was causing issues and we wanted the same parsing of arguments. That's fine then, thanks for showing me what the output looks like

Copy link
Collaborator

@anhuong anhuong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Ted for adding the linting, explaining the changes, and for creating issue #78 for documenting the linting process.

@anhuong anhuong merged commit 8e0a8f8 into foundation-model-stack:main Mar 6, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable lint for PR
5 participants