Enable pylint in the github workflow #63

tedhtchang · 2024-02-27T16:39:11Z

Changes

Add pylint to workflow
Fix existing pylint errors

Closes #59
Test locally:
git clone https://github.com/tedhtchang/fms-hf-tuning -b Enable-pylint

cd fms-hf-tuning
pip install -r setup_requirement.txt
tox run -e lint

tedhtchang · 2024-02-27T23:50:18Z

@hickeyma What's your recommendation for the following warnings/convention? They seem reasonable to me. Do we have to correct every one?

....
************* Module tuning.sft_trainer
tuning/sft_trainer.py:84:13: W1514: Using open without explicitly specifying an encoding (unspecified-encoding)
tuning/sft_trainer.py:171:4: W1203: Use lazy % or .format() formatting in logging functions (logging-fstring-interpolation)
tuning/sft_trainer.py:173:8: W1203: Use lazy % or .format() formatting in logging functions (logging-fstring-interpolation)
tuning/sft_trainer.py:208:21: C3001: Lambda expression assigned to a variable. Define a function using the "def" keyword instead. (unnecessary-lambda-assignment)
tuning/sft_trainer.py:215:4: W1203: Use lazy % or .format() formatting in logging functions (logging-fstring-interpolation)
tuning/sft_trainer.py:220:8: W1203: Use lazy % or .format() formatting in logging functions (logging-fstring-interpolation)
************* Module tuning.utils.merge_model_utils
tuning/utils/merge_model_utils.py:84:9: W1514: Using open without explicitly specifying an encoding (unspecified-encoding)
************* Module scripts.run_inference
scripts/run_inference.py:80:13: W1514: Using open without explicitly specifying an encoding (unspecified-encoding)
scripts/run_inference.py:84:13: W1514: Using open without explicitly specifying an encoding (unspecified-encoding)
scripts/run_inference.py:247:13: W1514: Using open without explicitly specifying an encoding (unspecified-encoding)
scripts/run_inference.py:260:9: W1514: Using open without explicitly specifying an encoding (unspecified-encoding)

....
Messages
--------
+------------------------------+------------+
|message id                    |occurrences |
+==============================+============+
|unspecified-encoding          |6           |
+------------------------------+------------+
|logging-fstring-interpolation |4           |
+------------------------------+------------+
|unnecessary-lambda-assignment |1           |
+------------------------------+------------+
-----------------------------------
Your code has been rated at 9.69/10
lint: exit 20 (15.69 seconds) /home/runner/work/fms-hf-tuning/fms-hf-tuning> pylint tuning 'scripts/*.py' pid=2242
.pkg: _exit> python /opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
  lint: FAIL code 20 (139.39=setup[123.70]+cmd[15.69] seconds)
  evaluation failed :( (139.45 seconds)
Error: Process completed with exit code 20.

jbusche · 2024-02-28T01:43:29Z

Nice @tedhtchang, just tested and I'm seeing the same lint results as you. They're warnings and convention errors, so I would imagine they're not critical at the moment but I suspect 3 issues could be created for each type and they could get addressed later as time permits...
W1514
W1203
C3001

Ssukriti · 2024-02-29T05:46:23Z

@tedhtchang one change I would suggest here is if we can update CI/CD checks to fail only if pylint score < 9.0 ?

As you can see the check is failing on your PR . If checks fail, we will block the PR . Expecting 10 /10 may be unreasonable at the moment

tedhtchang · 2024-02-29T08:01:22Z

Sure. I disabled the 3 checkers to pass the lint for 10/10 but covered up these warning messages is not ideal.
I like score <9 threshold idea.

jbusche · 2024-02-29T08:45:01Z

That worked well @tedhtchang

Your code has been rated at 9.69/10

.pkg: _exit> python /root/.env/lib64/python3.11/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
  lint: OK (837.33=setup[810.68]+cmd[26.64] seconds)
  congratulations :) (837.42 seconds)

hickeyma · 2024-02-29T10:01:52Z

one change I would suggest here is if we can update CI/CD checks to fail only if pylint score < 9.0 ?

This is not good practise. I recommend checking out the pylint code first (search: pylint W1203 for example) and if the change recommended doesn't suit for a good technical reason then disable using the comment: # pylint: disable=

hickeyma · 2024-02-29T10:02:59Z

They seem reasonable to me. Do we have to correct every one?

@tedhtchang Maybe you can't resolve everyone of the warnings/errors but need to decide for a good reason and disable as shown in previous comment above.

tedhtchang · 2024-02-29T20:30:44Z

The black formatter doesn't like those# pylint: disable=logging-fstring-interpolation or # pylint: disable=unspecified-encoding comments because they make lines longer than 100 columns :(

I ended up fixed almost every warning except :

        format_dataset = lambda example: {  # pylint: disable=unnecessary-lambda-assignment
        f"{data_args.dataset_text_field}": example[f"{data_args.dataset_text_field}"]
        + tokenizer.eos_token
    }

and

- https://huggingface.co/docs/peft/main/en/package_reference/lora#peft.LoraModel.add_weighted_adapter # pylint: disable=line-too-long

@Ssukriti We should to manually run functional test that can exercise these modules:
sft_trainer.py, run_inference.py, merge_model_utils.py

tedhtchang · 2024-03-01T19:10:44Z

I used this command to see if sft_trainer module is till functional. Do we have any other functional test to verify this module ?

export CUDA_VISIBLE_DEVICES=0
export WORLD_SIZE=0
export MODEL_PATH="bigscience/bloomz-560m"
export DATA_PATH="twitter_complaints.json"
export OUTPUT_PATH="/tmp/fms-hf-tuning/output"
python -m tuning.sft_trainer  \
--model_name_or_path $MODEL_PATH  \
--data_path $DATA_PATH  \
--output_dir $OUTPUT_PATH  \
--num_train_epochs 1  \
--per_device_train_batch_size 4  \
--per_device_eval_batch_size 4  \
--gradient_accumulation_steps 4  \
--evaluation_strategy "no"  \
--save_strategy "epoch"  \
--learning_rate 1e-5  \
--weight_decay 0.0  \
--lr_scheduler_type "cosine"  \
--logging_steps 1  \
--packing false  \
--include_tokens_per_second true \
--response_template "\n### Response:"  \
--dataset_text_field "output" \
--tokenizer_name_or_path "bigscience/tokenizer" \
--use_flash_attn false \
--torch_dtype "float32" \
--peft_method "pt"

tuning/sft_trainer.py

Ssukriti · 2024-03-05T00:38:40Z

@tedhtchang @hickeyma we have seen those conflicts before pylint and black even before and we have not been able to mandate a 10 pylint score. I do not want to slow down contributions getting stuck in these kind of conflicts and have to figure out how to disable pylint.
There are lot of instances online that mention problems with pylint https://pythonspeed.com/articles/pylint/ (see how to use pylint section).

since we want to use pylint only to catch production errors, we can either enable only few checks as suggested in the article
Another way to do it is, since Ted went over the changes and fixed everything he could - we cap CI/CD check to fail only if score below whatever score it is at now, to catch only new errors going forward. If score needs to be further reduced in future, we will reduce it. We do not want that CI/CD to always fail with a score of 10, otherwise no use having that CI/CD check

2.a If we are going this route and asking users to fix or disable pylint checks as needed, please document how other contributors can do the same in contributions.md. This is a very fast moving team and we need to release features every two weeks, I cannot slow down PRs by a few days because contributors cant pass the pylint check

Ssukriti · 2024-03-05T00:57:15Z

for functional testing if you ran command in your comment #63 (comment) , that should be sufficient.
we will do further testing on our side, @anhuong will run one test to make sure tuning still works as part of review

anhuong · 2024-03-05T12:10:56Z

@tedhtchang the above command worked fine for me. After checking out your PR branch and installing it:

$ export CUDA_VISIBLE_DEVICES=0
$ export WORLD_SIZE=0
$ export MODEL_PATH="bigscience/bloomz-560m"
$ export DATA_PATH="/dccstor/anhtest/twitter_complaints.json"
$ export OUTPUT_PATH="/tmp/fms-hf-tuning/output"
$ python -m tuning.sft_trainer  \
> --model_name_or_path $MODEL_PATH  \
> --data_path $DATA_PATH  \
> --output_dir $OUTPUT_PATH  \
> --num_train_epochs 1  \
> --per_device_train_batch_size 4  \
> --per_device_eval_batch_size 4  \
> --gradient_accumulation_steps 4  \
> --evaluation_strategy "no"  \
> --save_strategy "epoch"  \
> --learning_rate 1e-5  \
> --weight_decay 0.0  \
> --lr_scheduler_type "cosine"  \
> --logging_steps 1  \
> --packing false  \
> --include_tokens_per_second true \
> --response_template "\n### Response:"  \
> --dataset_text_field "output" \
> --tokenizer_name_or_path "bigscience/tokenizer" \
> --use_flash_attn false \
> --torch_dtype "float32" \
> --peft_method "pt"

 UserWarning: Can't initialize NVML
  warnings.warn("Can't initialize NVML")
Map: 100%|████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 547.45 examples/s]
tokenizer_config.json: 100%|██████████████████████████████████████████████| 227/227 [00:00<00:00, 140kB/s]
tokenizer.json: 100%|████████████████████████████████████████████████| 14.5M/14.5M [00:00<00:00, 85.2MB/s]
special_tokens_map.json: 100%|█████████████████████████████████████████| 85.0/85.0 [00:00<00:00, 35.5kB/s]
Map: 100%|███████████████████████████████████████████████████████| 50/50 [00:00<00:00, 4051.22 examples/s]
/dccstor/anhtest/fms-hf-tuning/venv/lib/python3.9/site-packages/trl/trainer/sft_trainer.py:294: UserWarning: You passed a tokenizer with `padding_side` not equal to `right` to the SFTTrainer. This might lead to some unexpected behaviour due to overflow issues when training a model in half-precision. You might consider adding `tokenizer.padding_side = 'right'` to your code.
  warnings.warn(
Detected kernel version 4.18.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
You're using a BloomTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
/lib/python3.9/site-packages/trl/trainer/utils.py:135: UserWarning: Could not find response key `[66673, 29]` in the following instance: <pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad>### Text: The wait is finally over, just joined @SquareUK, hope to get started real soon!

### Label: no complaint</s> This instance will be ignored in loss calculation. Note, if this happens often, consider increasing the `max_seq_length`.
....
....
{'train_runtime': 411.0194, 'train_samples_per_second': 0.122, 'train_steps_per_second': 0.007, 'train_tokens_per_second': 5.421, 'train_loss': 0.0, 'epoch': 0.92}
100%|██████████████████████████████████████████████████████████████████████| 3/3 [06:50<00:00, 136.93s/it]

Not sure why it didn't work for you. Also worked with llama-7b model

tedhtchang · 2024-03-05T16:11:47Z

@tedhtchang the above command worked fine for me. After checking out your PR branch and installing it:
...
Not sure why it didn't work for you. Also worked with llama-7b model

@anhuong thanks for review. I think it failed to run with transformers==4.38.2 package installed. I had to downgrade to 4.37.2

Signed-off-by: ted chang <[email protected]>

tedhtchang · 2024-03-05T16:21:46Z

@hickeyma 's Recommendation is contributors should fix, disable(in-line), disable globally in that order.

Signed-off-by: ted chang <[email protected]>

anhuong

@tedhtchang Do you plan to add details on how to run and address errors from the linter in a later PR after Contributions.md has been added?

anhuong · 2024-03-05T21:39:51Z

.pylintrc

@@ -443,7 +443,9 @@ disable=raw-checker-failed,
        attribute-defined-outside-init,
        abstract-method,
        pointless-statement,
-        wrong-import-order
+        wrong-import-order,
+        duplicate-code,


Was there duplicate code in the launch_training that wasn't worth fixing? Seems like a good check to have instead of globally removing

@anhuong What do you think? I can revert this change and fix launch_training.py in another PR properly.
It looked a little messy because I had to put many #pylint: disable=duplicate-code in those 2 files. I could fix but then I have to figure out how to test it which could further delay this PR.

build/launch_training.py:1:0: R0801: Similar lines in 2 files ==build.launch_training:[65:74] ==tuning.sft_trainer:[277:286] parser = transformers.HfArgumentParser( dataclass_types=( configs.ModelArguments, configs.DataArguments, configs.TrainingArguments, peft_config.LoraConfig, peft_config.PromptTuningConfig, ) ) (duplicate-code) build/launch_training.py:1:0: R0801: Similar lines in 2 files ==build.launch_training:[80:86] ==tuning.sft_trainer:[292:298] ( model_args, data_args, training_args, lora_config, prompt_tuning_config, (duplicate-code)

That is true I duplicated the code because passing the arguments to sft_trainer was causing issues and we wanted the same parsing of arguments. That's fine then, thanks for showing me what the output looks like

anhuong

Thanks Ted for adding the linting, explaining the changes, and for creating issue #78 for documenting the linting process.

anhuong reviewed Mar 4, 2024

View reviewed changes

tuning/sft_trainer.py Outdated Show resolved Hide resolved

tedhtchang added 2 commits March 5, 2024 08:14

Enable pylint in the github workflow

a54e8cd

Signed-off-by: ted chang <[email protected]>

disable redefined-outer-name lint error

9d789b3

Signed-off-by: ted chang <[email protected]>

Add build folder to pylint

677f28a

Signed-off-by: ted chang <[email protected]>

anhuong reviewed Mar 5, 2024

View reviewed changes

anhuong approved these changes Mar 6, 2024

View reviewed changes

Merge branch 'main' into Enable-pylint

a6cfa6a

anhuong merged commit 8e0a8f8 into foundation-model-stack:main Mar 6, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable pylint in the github workflow #63

Enable pylint in the github workflow #63

tedhtchang commented Feb 27, 2024 •

edited

Loading

tedhtchang commented Feb 27, 2024

jbusche commented Feb 28, 2024

Ssukriti commented Feb 29, 2024

tedhtchang commented Feb 29, 2024 •

edited

Loading

jbusche commented Feb 29, 2024

hickeyma commented Feb 29, 2024

hickeyma commented Feb 29, 2024

tedhtchang commented Feb 29, 2024

tedhtchang commented Mar 1, 2024 •

edited

Loading

Ssukriti commented Mar 5, 2024 •

edited

Loading

Ssukriti commented Mar 5, 2024

anhuong commented Mar 5, 2024

tedhtchang commented Mar 5, 2024 •

edited

Loading

tedhtchang commented Mar 5, 2024 •

edited

Loading

anhuong left a comment

anhuong Mar 5, 2024

tedhtchang Mar 5, 2024

anhuong Mar 6, 2024

anhuong left a comment

Enable pylint in the github workflow #63

Enable pylint in the github workflow #63

Conversation

tedhtchang commented Feb 27, 2024 • edited Loading

tedhtchang commented Feb 27, 2024

jbusche commented Feb 28, 2024

Ssukriti commented Feb 29, 2024

tedhtchang commented Feb 29, 2024 • edited Loading

jbusche commented Feb 29, 2024

hickeyma commented Feb 29, 2024

hickeyma commented Feb 29, 2024

tedhtchang commented Feb 29, 2024

tedhtchang commented Mar 1, 2024 • edited Loading

Ssukriti commented Mar 5, 2024 • edited Loading

Ssukriti commented Mar 5, 2024

anhuong commented Mar 5, 2024

tedhtchang commented Mar 5, 2024 • edited Loading

tedhtchang commented Mar 5, 2024 • edited Loading

anhuong left a comment

Choose a reason for hiding this comment

anhuong Mar 5, 2024

Choose a reason for hiding this comment

tedhtchang Mar 5, 2024

Choose a reason for hiding this comment

anhuong Mar 6, 2024

Choose a reason for hiding this comment

anhuong left a comment

Choose a reason for hiding this comment

tedhtchang commented Feb 27, 2024 •

edited

Loading

tedhtchang commented Feb 29, 2024 •

edited

Loading

tedhtchang commented Mar 1, 2024 •

edited

Loading

Ssukriti commented Mar 5, 2024 •

edited

Loading

tedhtchang commented Mar 5, 2024 •

edited

Loading

tedhtchang commented Mar 5, 2024 •

edited

Loading