Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LightningCLI doesn't fail when config.yaml contains invalid arguments #20337

Open
adosar opened this issue Oct 11, 2024 · 0 comments
Open

LightningCLI doesn't fail when config.yaml contains invalid arguments #20337

adosar opened this issue Oct 11, 2024 · 0 comments
Labels
bug Something isn't working needs triage Waiting to be triaged by maintainers ver: 2.4.x

Comments

@adosar
Copy link
Contributor

adosar commented Oct 11, 2024

Bug description

I was playing around with the LightningCLI and I found out that it can still work even if the config.yaml contains invalid data types. For example, max_epochs for Trainer should be int. However, it still succeeds with a str in the .yaml. In the MWE, you can see that config.yaml contains str for both seed_everything and max_epochs. This is also evident when reading back the config.yaml file:

import yaml

with open('config.yaml', 'r') as fhand:
    data = yaml.load(fhand)

print(data)
{'seed_everything': '1042', 'trainer': {'max_epochs': '2'}}  # Prints this

Note

I am not sure if this is really a bug, since it might be the case that the LightningCLI converts the given data types to the correct ones based on the type hints. However, I couldn't find if this is really the case.

What version are you seeing the problem on?

v2.4

How to reproduce the bug

# main.py
from lightning.pytorch.cli import LightningCLI

# simple demo classes for your convenience
from lightning.pytorch.demos.boring_classes import DemoModel, BoringDataModule


def cli_main():
    cli = LightningCLI(DemoModel, BoringDataModule)
    # note: don't call fit!!


if __name__ == "__main__":
    cli_main()
    # note: it is good practice to implement the CLI in a function and call it in the main if block
# config.yaml
seed_everything: "1042"

trainer:
  max_epochs: "2"

Now from the CLI:

python main.py fit --config=config.yaml


### Error messages and logs

config.yaml lightning_logs/ main.py
(aidsorb) [ansar@mofinium ligthning_bug]$ python main.py fit --config=config.yaml
Seed set to 1042
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/home/ansar/venvir/aidsorb/lib64/python3.11/site-packages/lightning/pytorch/trainer/configuration_validator.py:68: You passed in a val_dataloader but have no validation_step. Skipping val loop.
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/2
[rank: 1] Seed set to 1042
Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/2

distributed_backend=nccl
All distributed processes registered. Starting with 2 processes

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1]

| Name | Type | Params | Mode

0 | l1 | Linear | 330 | train

330 Trainable params
0 Non-trainable params
330 Total params
0.001 Total estimated model params size (MB)
1 Modules in train mode
0 Modules in eval mode
/home/ansar/venvir/aidsorb/lib64/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argumenttonum_workers=9in theDataLoader to improve performance. /home/ansar/venvir/aidsorb/lib64/python3.11/site-packages/lightning/pytorch/loops/fit_loop.py:298: The number of training batches (32) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch. Epoch 1: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 32/32 [00:00<00:00, 1100.86it/s, v_num=3]Trainer.fitstopped:max_epochs=2` reached.
Epoch 1: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 32/32 [00:00<00:00, 1045.92it/s, v_num



### Environment

<details>
  <summary>Current environment</summary>

* CUDA:
	- GPU:
		- Quadro RTX 4000
		- Quadro RTX 4000
	- available:         True
	- version:           12.1
* Lightning:
	- lightning:         2.4.0
	- lightning-utilities: 0.11.7
	- pytorch-lightning: 2.4.0
	- torch:             2.4.1
	- torchmetrics:      1.4.3
	- torchvision:       0.19.1
* Packages:
	- absl-py:           2.1.0
	- aidsorb:           1.0.0
	- aiohappyeyeballs:  2.4.3
	- aiohttp:           3.10.9
	- aiosignal:         1.3.1
	- ase:               3.23.0
	- attrs:             24.2.0
	- contourpy:         1.3.0
	- cycler:            0.12.1
	- docstring-parser:  0.16
	- filelock:          3.16.1
	- fire:              0.7.0
	- fonttools:         4.54.1
	- frozenlist:        1.4.1
	- fsspec:            2024.9.0
	- grpcio:            1.66.2
	- idna:              3.10
	- importlib-resources: 6.4.5
	- jinja2:            3.1.4
	- jsonargparse:      4.33.2
	- kiwisolver:        1.4.7
	- lightning:         2.4.0
	- lightning-utilities: 0.11.7
	- markdown:          3.7
	- markupsafe:        3.0.1
	- matplotlib:        3.9.2
	- mpmath:            1.3.0
	- multidict:         6.1.0
	- networkx:          3.3
	- numpy:             1.26.4
	- nvidia-cublas-cu12: 12.1.3.1
	- nvidia-cuda-cupti-cu12: 12.1.105
	- nvidia-cuda-nvrtc-cu12: 12.1.105
	- nvidia-cuda-runtime-cu12: 12.1.105
	- nvidia-cudnn-cu12: 9.1.0.70
	- nvidia-cufft-cu12: 11.0.2.54
	- nvidia-curand-cu12: 10.3.2.106
	- nvidia-cusolver-cu12: 11.4.5.107
	- nvidia-cusparse-cu12: 12.1.0.106
	- nvidia-nccl-cu12:  2.20.5
	- nvidia-nvjitlink-cu12: 12.6.77
	- nvidia-nvtx-cu12:  12.1.105
	- packaging:         24.1
	- pandas:            2.2.3
	- pillow:            10.4.0
	- pip:               24.2
	- plotly:            5.24.1
	- propcache:         0.2.0
	- protobuf:          5.28.2
	- pyparsing:         3.1.4
	- python-dateutil:   2.9.0.post0
	- pytorch-lightning: 2.4.0
	- pytz:              2024.2
	- pyyaml:            6.0.2
	- scipy:             1.14.1
	- setuptools:        65.5.1
	- six:               1.16.0
	- sympy:             1.13.3
	- tenacity:          9.0.0
	- tensorboard:       2.18.0
	- tensorboard-data-server: 0.7.2
	- termcolor:         2.5.0
	- torch:             2.4.1
	- torchmetrics:      1.4.3
	- torchvision:       0.19.1
	- tqdm:              4.66.5
	- triton:            3.0.0
	- typeshed-client:   2.7.0
	- typing-extensions: 4.12.2
	- tzdata:            2024.2
	- werkzeug:          3.0.4
	- yarl:              1.14.0
* System:
	- OS:                Linux
	- architecture:
		- 64bit
		- ELF
	- processor:         x86_64
	- python:            3.11.7
	- release:           5.14.0-427.16.1.el9_4.x86_64
	- version:           #1 SMP PREEMPT_DYNAMIC Wed May 8 17:48:14 UTC 2024

</details>

### More info

_No response_
@adosar adosar added bug Something isn't working needs triage Waiting to be triaged by maintainers labels Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Waiting to be triaged by maintainers ver: 2.4.x
Projects
None yet
Development

No branches or pull requests

1 participant