Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Everything prints fine, but the loss doesn't descent #20344

Open
2catycm opened this issue Oct 15, 2024 · 6 comments
Open

Everything prints fine, but the loss doesn't descent #20344

2catycm opened this issue Oct 15, 2024 · 6 comments
Labels
bug Something isn't working needs triage Waiting to be triaged by maintainers ver: 2.3.x

Comments

@2catycm
Copy link

2catycm commented Oct 15, 2024

Bug description

Even after I set the learning rate to 1 and even 100,
the loss doesn't change at all, it is always 4.60.
I tried to debug into what happens, but it seems everything works fine, the loss is backwarded successfully, the grads of each parameters looks well, the optimizer is indeed called

What version are you seeing the problem on?

v2.3

How to reproduce the bug

class ClassificationTask(L.LightningModule):
    def __init__(self, config: ClassificationTaskConfig)->None:
        super().__init__()
        self.save_hyperparameters(config.model_dump())
        L.seed_everything(config.experiment_index) # use index as the seed for reproducibility
        self.lit_data:ClassificationDataModule = config.dataset_config.get_lightning_data_module()
        config.cls_model_config.num_of_classes = self.lit_data.num_of_classes
        self.cls_model:HuggingfaceModel = config.cls_model_config.get_cls_model()
        self.lit_data.set_transform_from_hf_image_preprocessor(hf_image_preprocessor=self.cls_model.image_preprocessor)
        
        model_image_size:tuple[int, int] = (self.cls_model.image_preprocessor.size['height'], self.cls_model.image_preprocessor.size['width'])
        self.example_input_array = torch.Tensor(1, self.cls_model.backbone.config.num_channels, *model_image_size)
        
        self.softmax = nn.Softmax(dim=1)    
        self.loss = nn.CrossEntropyLoss(label_smoothing=config.label_smoothing)
        
        self.automatic_optimization = False # The problem occurs when True, so I tried to use False to see what happens
    
    def compute_model_logits(self, image_tensor:torch.Tensor)-> torch.Tensor:
        return self.cls_model(image_tensor)
    
    @override
    def forward(self, image_tensor:torch.Tensor, *args, **kwargs)-> torch.Tensor:
        return self.softmax(self.compute_model_logits(image_tensor))

    def forward_loss(self, image_tensor: torch.Tensor, label_tensor:torch.Tensor)->torch.Tensor:
        probs = self(image_tensor)
        # return F.nll_loss(logits, label_tensor)
        return self.loss(probs, label_tensor)
    
    @override
    def training_step(self, batch, batch_idx=None, *args, **kwargs)-> STEP_OUTPUT:
        self.train()
        opt = self.optimizers()
        opt.zero_grad()
        
        loss = self.forward_loss(*batch)
        self.log("train_loss", loss, prog_bar=True)
        # self.manual_backward(loss)
        loss.backward()
        opt.step()
        return loss

    @override    
    def configure_optimizers(self) -> OptimizerLRScheduler:
        return torch.optim.AdamW(self.parameters(), lr=self.hparams.learning_rate)
from .core import ClassificationTask, ClassificationTaskConfig
config = ClassificationTaskConfig()
config.learning_rate = 3e-4 # doesn't work
config.learning_rate = 1000 # should expect a NaN if it is optimizing, try to debug
config.dataset_config.batch_size = 64
cls_task = ClassificationTask(config)

import lightning as L
from .utils import runs_path
from lightning.pytorch.callbacks.early_stopping import EarlyStopping
from lightning.pytorch.callbacks import ModelSummary, StochasticWeightAveraging, DeviceStatsMonitor
from lightning.pytorch.loggers import TensorBoardLogger, CSVLogger
trainer = L.Trainer(default_root_dir=runs_path, enable_checkpointing=True, 
                    enable_model_summary=True, 
                    num_sanity_val_steps=2, 
                    callbacks=[
                        EarlyStopping(monitor="val_acc1", mode="max", check_finite=True, 
                                      patience=5, 
                                      check_on_train_epoch_end=False,  # check on validation end
                                      verbose=True),
                        ModelSummary(max_depth=3),
                        DeviceStatsMonitor(cpu_stats=True)
                               ]
                    
                    , logger=[TensorBoardLogger(save_dir=runs_path/"tensorboard"), CSVLogger(save_dir=runs_path)]
                    )
trainer.fit(cls_task, datamodule=cls_task.lit_data)

Error messages and logs

root
└── cls_model (HuggingfaceModel)
    ├── backbone (ViTModel)
    │   ├── embeddings (ViTEmbeddings) cls_token:[1, 1, 768] position_embeddings:[1, 197, 768]
    │   │   └── patch_embeddings (ViTPatchEmbeddings)
    │   │       └── projection (Conv2d) weight:[768, 3, 16, 16] bias:[768]
    │   ├── encoder (ViTEncoder)
    │   │   └── layer (ModuleList)
    │   │       └── 0-11(ViTLayer)
    │   │           ├── attention (ViTAttention)
    │   │           │   ├── attention (ViTSelfAttention)
    │   │           │   │   └── query,key,value(Linear) weight:[768, 768] bias:[768]
    │   │           │   └── output (ViTSelfOutput)
    │   │           │       └── dense (Linear) weight:[768, 768] bias:[768]
    │   │           ├── intermediate (ViTIntermediate)
    │   │           │   └── dense (Linear) weight:[3072, 768] bias:[3072]
    │   │           ├── output (ViTOutput)
    │   │           │   └── dense (Linear) weight:[768, 3072] bias:[768]
    │   │           └── layernorm_before,layernorm_after(LayerNorm) weight:[768] bias:[768]
    │   ├── layernorm (LayerNorm) weight:[768] bias:[768]
    │   └── pooler (ViTPooler)
    │       └── dense (Linear) weight:[768, 768] bias:[768]
    └── head (Linear) weight:[100, 768] bias:[100]
Files already downloaded and verified
Files already downloaded and verified
202

Sanity Checking: |          | 0/? [00:00<?, ?it/s]
Sanity Checking:   0%|          | 0/2 [00:00<?, ?it/s]
Sanity Checking DataLoader 0:   0%|          | 0/2 [00:00<?, ?it/s]
Sanity Checking DataLoader 0:  50%|█████     | 1/2 [00:00<00:00,  1.78it/s]
Sanity Checking DataLoader 0: 100%|██████████| 2/2 [00:00<00:00,  2.78it/s]
                                                                           

Training: |          | 0/? [00:00<?, ?it/s]
Training:   0%|          | 0/704 [00:00<?, ?it/s]
Epoch 0:   0%|          | 0/704 [00:00<?, ?it/s] 
Epoch 0:   0%|          | 1/704 [00:02<30:06,  0.39it/s]
Epoch 0:   0%|          | 1/704 [00:02<30:07,  0.39it/s, v_num=11, train_loss=4.610]
Epoch 0:   0%|          | 2/704 [00:03<17:54,  0.65it/s, v_num=11, train_loss=4.610]
Epoch 0:   0%|          | 2/704 [00:03<17:55,  0.65it/s, v_num=11, train_loss=4.610]
Epoch 0:   0%|          | 3/704 [00:03<13:59,  0.84it/s, v_num=11, train_loss=4.610]
Epoch 0:   0%|          | 3/704 [00:03<14:01,  0.83it/s, v_num=11, train_loss=4.600]
Epoch 0:   1%|          | 4/704 [00:03<11:26,  1.02it/s, v_num=11, train_loss=4.600]
Epoch 0:   1%|          | 4/704 [00:04<11:49,  0.99it/s, v_num=11, train_loss=4.610]
Epoch 0:   1%|          | 5/704 [00:04<09:31,  1.22it/s, v_num=11, train_loss=4.610]
Epoch 0:   1%|          | 5/704 [00:04<10:25,  1.12it/s, v_num=11, train_loss=4.600]
Epoch 0:   1%|          | 6/704 [00:04<08:46,  1.33it/s, v_num=11, train_loss=4.600]
Epoch 0:   1%|          | 6/704 [00:04<09:30,  1.22it/s, v_num=11, train_loss=4.600]
Epoch 0:   1%|          | 7/704 [00:04<08:11,  1.42it/s, v_num=11, train_loss=4.600]
Epoch 0:   1%|          | 7/704 [00:05<08:50,  1.31it/s, v_num=11, train_loss=4.610]
Epoch 0:   1%|          | 8/704 [00:05<07:52,  1.47it/s, v_num=11, train_loss=4.610]
Epoch 0:   1%|          | 8/704 [00:05<08:22,  1.39it/s, v_num=11, train_loss=4.600]
Epoch 0:   1%|▏         | 9/704 [00:05<07:35,  1.53it/s, v_num=11, train_loss=4.600]
Epoch 0:   1%|▏         | 9/704 [00:06<07:58,  1.45it/s, v_num=11, train_loss=4.610]
Epoch 0:   1%|▏         | 10/704 [00:06<07:18,  1.58it/s, v_num=11, train_loss=4.610]
Epoch 0:   1%|▏         | 10/704 [00:06<07:39,  1.51it/s, v_num=11, train_loss=4.610]
Epoch 0:   2%|▏         | 11/704 [00:06<06:59,  1.65it/s, v_num=11, train_loss=4.610]
Epoch 0:   2%|▏         | 11/704 [00:07<07:23,  1.56it/s, v_num=11, train_loss=4.610]
Epoch 0:   2%|▏         | 12/704 [00:07<06:48,  1.70it/s, v_num=11, train_loss=4.610]
Epoch 0:   2%|▏         | 12/704 [00:07<07:10,  1.61it/s, v_num=11, train_loss=4.610]
Epoch 0:   2%|▏         | 13/704 [00:07<06:39,  1.73it/s, v_num=11, train_loss=4.610]
Epoch 0:   2%|▏         | 13/704 [00:07<06:59,  1.65it/s, v_num=11, train_loss=4.600]
Epoch 0:   2%|▏         | 14/704 [00:07<06:30,  1.77it/s, v_num=11, train_loss=4.600]
Epoch 0:   2%|▏         | 14/704 [00:08<06:49,  1.68it/s, v_num=11, train_loss=4.600]
Epoch 0:   2%|▏         | 15/704 [00:08<06:23,  1.80it/s, v_num=11, train_loss=4.600]
Epoch 0:   2%|▏         | 15/704 [00:08<06:41,  1.72it/s, v_num=11, train_loss=4.600]
Epoch 0:   2%|▏         | 16/704 [00:08<06:16,  1.83it/s, v_num=11, train_loss=4.600]
Epoch 0:   2%|▏         | 16/704 [00:09<06:33,  1.75it/s, v_num=11, train_loss=4.610]
Epoch 0:   2%|▏         | 17/704 [00:09<06:11,  1.85it/s, v_num=11, train_loss=4.610]
Epoch 0:   2%|▏         | 17/704 [00:09<06:27,  1.77it/s, v_num=11, train_loss=4.600]
Epoch 0:   3%|▎         | 18/704 [00:09<06:06,  1.87it/s, v_num=11, train_loss=4.600]
Epoch 0:   3%|▎         | 18/704 [00:10<06:21,  1.80it/s, v_num=11, train_loss=4.600]
Epoch 0:   3%|▎         | 19/704 [00:10<06:02,  1.89it/s, v_num=11, train_loss=4.600]
Epoch 0:   3%|▎         | 19/704 [00:10<06:15,  1.82it/s, v_num=11, train_loss=4.610]
Epoch 0:   3%|▎         | 20/704 [00:10<05:57,  1.91it/s, v_num=11, train_loss=4.610]
Epoch 0:   3%|▎         | 20/704 [00:10<06:10,  1.84it/s, v_num=11, train_loss=4.610]
Epoch 0:   3%|▎         | 21/704 [00:10<05:53,  1.93it/s, v_num=11, train_loss=4.610]
Epoch 0:   3%|▎         | 21/704 [00:11<06:06,  1.86it/s, v_num=11, train_loss=4.600]
Epoch 0:   3%|▎         | 22/704 [00:11<05:50,  1.95it/s, v_num=11, train_loss=4.600]
Epoch 0:   3%|▎         | 22/704 [00:11<06:02,  1.88it/s, v_num=11, train_loss=4.610]
Epoch 0:   3%|▎         | 23/704 [00:11<05:48,  1.95it/s, v_num=11, train_loss=4.610]
Epoch 0:   3%|▎         | 23/704 [00:12<05:58,  1.90it/s, v_num=11, train_loss=4.600]
Epoch 0:   3%|▎         | 24/704 [00:12<05:44,  1.97it/s, v_num=11, train_loss=4.600]
Epoch 0:   3%|▎         | 24/704 [00:12<05:55,  1.91it/s, v_num=11, train_loss=4.600]
Epoch 0:   4%|▎         | 25/704 [00:12<05:41,  1.99it/s, v_num=11, train_loss=4.600]
Epoch 0:   4%|▎         | 25/704 [00:12<05:52,  1.93it/s, v_num=11, train_loss=4.610]
Epoch 0:   4%|▎         | 26/704 [00:13<05:39,  2.00it/s, v_num=11, train_loss=4.610]
Epoch 0:   4%|▎         | 26/704 [00:13<05:49,  1.94it/s, v_num=11, train_loss=4.600]
Epoch 0:   4%|▍         | 27/704 [00:13<05:36,  2.01it/s, v_num=11, train_loss=4.600]
Epoch 0:   4%|▍         | 27/704 [00:13<05:46,  1.95it/s, v_num=11, train_loss=4.600]
Epoch 0:   4%|▍         | 28/704 [00:13<05:34,  2.02it/s, v_num=11, train_loss=4.600]
Epoch 0:   4%|▍         | 28/704 [00:14<05:43,  1.97it/s, v_num=11, train_loss=4.600]
Epoch 0:   4%|▍         | 29/704 [00:14<05:32,  2.03it/s, v_num=11, train_loss=4.600]
Epoch 0:   4%|▍         | 29/704 [00:14<05:41,  1.98it/s, v_num=11, train_loss=4.600]
Epoch 0:   4%|▍         | 30/704 [00:14<05:30,  2.04it/s, v_num=11, train_loss=4.600]
Epoch 0:   4%|▍         | 30/704 [00:15<05:39,  1.99it/s, v_num=11, train_loss=4.600]
Epoch 0:   4%|▍         | 31/704 [00:15<05:30,  2.03it/s, v_num=11, train_loss=4.600]
Epoch 0:   4%|▍         | 31/704 [00:15<05:36,  2.00it/s, v_num=11, train_loss=4.600]
Epoch 0:   5%|▍         | 32/704 [00:15<05:27,  2.05it/s, v_num=11, train_loss=4.600]
Epoch 0:   5%|▍         | 32/704 [00:15<05:34,  2.01it/s, v_num=11, train_loss=4.600]
Epoch 0:   5%|▍         | 33/704 [00:15<05:24,  2.07it/s, v_num=11, train_loss=4.600]
Epoch 0:   5%|▍         | 33/704 [00:16<05:32,  2.02it/s, v_num=11, train_loss=4.600]
Epoch 0:   5%|▍         | 34/704 [00:16<05:23,  2.07it/s, v_num=11, train_loss=4.600]
Epoch 0:   5%|▍         | 34/704 [00:16<05:30,  2.03it/s, v_num=11, train_loss=4.610]
Epoch 0:   5%|▍         | 35/704 [00:16<05:21,  2.08it/s, v_num=11, train_loss=4.610]
Epoch 0:   5%|▍         | 35/704 [00:17<05:29,  2.03it/s, v_num=11, train_loss=4.600]
Epoch 0:   5%|▌         | 36/704 [00:17<05:20,  2.09it/s, v_num=11, train_loss=4.600]
Epoch 0:   5%|▌         | 36/704 [00:17<05:27,  2.04it/s, v_num=11, train_loss=4.600]
Epoch 0:   5%|▌         | 37/704 [00:17<05:18,  2.09it/s, v_num=11, train_loss=4.600]
Epoch 0:   5%|▌         | 37/704 [00:18<05:25,  2.05it/s, v_num=11, train_loss=4.600]
Epoch 0:   5%|▌         | 38/704 [00:18<05:17,  2.10it/s, v_num=11, train_loss=4.600]
Epoch 0:   5%|▌         | 38/704 [00:18<05:24,  2.06it/s, v_num=11, train_loss=4.600]
Epoch 0:   6%|▌         | 39/704 [00:18<05:15,  2.11it/s, v_num=11, train_loss=4.600]
Epoch 0:   6%|▌         | 39/704 [00:18<05:22,  2.06it/s, v_num=11, train_loss=4.600]
Epoch 0:   6%|▌         | 40/704 [00:18<05:15,  2.11it/s, v_num=11, train_loss=4.600]
Epoch 0:   6%|▌         | 40/704 [00:19<05:21,  2.07it/s, v_num=11, train_loss=4.600]
Epoch 0:   6%|▌         | 41/704 [00:19<05:13,  2.12it/s, v_num=11, train_loss=4.600]
Epoch 0:   6%|▌         | 41/704 [00:19<05:19,  2.07it/s, v_num=11, train_loss=4.610]
Epoch 0:   6%|▌         | 42/704 [00:19<05:12,  2.12it/s, v_num=11, train_loss=4.610]
Epoch 0:   6%|▌         | 42/704 [00:20<05:18,  2.08it/s, v_num=11, train_loss=4.600]
Epoch 0:   6%|▌         | 43/704 [00:20<05:10,  2.13it/s, v_num=11, train_loss=4.600]
Epoch 0:   6%|▌         | 43/704 [00:20<05:16,  2.09it/s, v_num=11, train_loss=4.600]
Epoch 0:   6%|▋         | 44/704 [00:20<05:09,  2.13it/s, v_num=11, train_loss=4.600]
Epoch 0:   6%|▋         | 44/704 [00:21<05:15,  2.09it/s, v_num=11, train_loss=4.600]
Epoch 0:   6%|▋         | 45/704 [00:21<05:09,  2.13it/s, v_num=11, train_loss=4.600]
Epoch 0:   6%|▋         | 45/704 [00:21<05:14,  2.10it/s, v_num=11, train_loss=4.600]
Epoch 0:   7%|▋         | 46/704 [00:21<05:07,  2.14it/s, v_num=11, train_loss=4.600]
Epoch 0:   7%|▋         | 46/704 [00:21<05:13,  2.10it/s, v_num=11, train_loss=4.600]
Epoch 0:   7%|▋         | 47/704 [00:21<05:06,  2.14it/s, v_num=11, train_loss=4.600]
Epoch 0:   7%|▋         | 47/704 [00:22<05:12,  2.11it/s, v_num=11, train_loss=4.600]
Epoch 0:   7%|▋         | 48/704 [00:22<05:05,  2.15it/s, v_num=11, train_loss=4.600]
Epoch 0:   7%|▋         | 48/704 [00:22<05:10,  2.11it/s, v_num=11, train_loss=4.600]
Epoch 0:   7%|▋         | 49/704 [00:22<05:05,  2.15it/s, v_num=11, train_loss=4.600]
Epoch 0:   7%|▋         | 49/704 [00:23<05:09,  2.11it/s, v_num=11, train_loss=4.600]
Epoch 0:   7%|▋         | 50/704 [00:23<05:04,  2.15it/s, v_num=11, train_loss=4.600]
Epoch 0:   7%|▋         | 50/704 [00:23<05:08,  2.12it/s, v_num=11, train_loss=4.600]
Epoch 0:   7%|▋         | 51/704 [00:23<05:03,  2.15it/s, v_num=11, train_loss=4.600]
Epoch 0:   7%|▋         | 51/704 [00:24<05:08,  2.12it/s, v_num=11, train_loss=4.600]
Epoch 0:   7%|▋         | 52/704 [00:24<05:02,  2.16it/s, v_num=11, train_loss=4.600]
Epoch 0:   7%|▋         | 52/704 [00:24<05:07,  2.12it/s, v_num=11, train_loss=4.600]
Epoch 0:   8%|▊         | 53/704 [00:24<05:01,  2.16it/s, v_num=11, train_loss=4.600]
Epoch 0:   8%|▊         | 53/704 [00:24<05:06,  2.13it/s, v_num=11, train_loss=4.600]
Epoch 0:   8%|▊         | 54/704 [00:24<05:00,  2.16it/s, v_num=11, train_loss=4.600]
Epoch 0:   8%|▊         | 54/704 [00:25<05:05,  2.13it/s, v_num=11, train_loss=4.600]
Epoch 0:   8%|▊         | 55/704 [00:25<04:59,  2.17it/s, v_num=11, train_loss=4.600]
Epoch 0:   8%|▊         | 55/704 [00:25<05:04,  2.13it/s, v_num=11, train_loss=4.600]
Epoch 0:   8%|▊         | 56/704 [00:25<04:58,  2.17it/s, v_num=11, train_loss=4.600]
Epoch 0:   8%|▊         | 56/704 [00:26<05:03,  2.14it/s, v_num=11, train_loss=4.600]
Epoch 0:   8%|▊         | 57/704 [00:26<04:57,  2.17it/s, v_num=11, train_loss=4.600]
Epoch 0:   8%|▊         | 57/704 [00:26<05:02,  2.14it/s, v_num=11, train_loss=4.600]
Epoch 0:   8%|▊         | 58/704 [00:26<04:57,  2.17it/s, v_num=11, train_loss=4.600]
Epoch 0:   8%|▊         | 58/704 [00:27<05:01,  2.14it/s, v_num=11, train_loss=4.600]
Epoch 0:   8%|▊         | 59/704 [00:27<04:56,  2.18it/s, v_num=11, train_loss=4.600]
Epoch 0:   8%|▊         | 59/704 [00:27<05:00,  2.15it/s, v_num=11, train_loss=4.610]
Epoch 0:   9%|▊         | 60/704 [00:27<04:55,  2.18it/s, v_num=11, train_loss=4.610]
Epoch 0:   9%|▊         | 60/704 [00:27<04:59,  2.15it/s, v_num=11, train_loss=4.600]
Epoch 0:   9%|▊         | 61/704 [00:27<04:54,  2.18it/s, v_num=11, train_loss=4.600]
Epoch 0:   9%|▊         | 61/704 [00:28<04:58,  2.15it/s, v_num=11, train_loss=4.600]
Epoch 0:   9%|▉         | 62/704 [00:28<04:53,  2.18it/s, v_num=11, train_loss=4.600]
Epoch 0:   9%|▉         | 62/704 [00:28<04:57,  2.16it/s, v_num=11, train_loss=4.600]
Epoch 0:   9%|▉         | 63/704 [00:28<04:53,  2.19it/s, v_num=11, train_loss=4.600]
Epoch 0:   9%|▉         | 63/704 [00:29<04:56,  2.16it/s, v_num=11, train_loss=4.600]
Epoch 0:   9%|▉         | 64/704 [00:29<04:52,  2.19it/s, v_num=11, train_loss=4.600]
Epoch 0:   9%|▉         | 64/704 [00:29<04:56,  2.16it/s, v_num=11, train_loss=4.600]
Epoch 0:   9%|▉         | 65/704 [00:29<04:51,  2.19it/s, v_num=11, train_loss=4.600]
Epoch 0:   9%|▉         | 65/704 [00:30<04:55,  2.16it/s, v_num=11, train_loss=4.600]
Epoch 0:   9%|▉         | 66/704 [00:30<04:50,  2.20it/s, v_num=11, train_loss=4.600]
Epoch 0:   9%|▉         | 66/704 [00:30<04:54,  2.17it/s, v_num=11, train_loss=4.600]
Epoch 0:  10%|▉         | 67/704 [00:30<04:50,  2.20it/s, v_num=11, train_loss=4.600]
Epoch 0:  10%|▉         | 67/704 [00:30<04:53,  2.17it/s, v_num=11, train_loss=4.600]
Epoch 0:  10%|▉         | 68/704 [00:30<04:49,  2.20it/s, v_num=11, train_loss=4.600]
Epoch 0:  10%|▉         | 68/704 [00:31<04:52,  2.17it/s, v_num=11, train_loss=4.600]
Epoch 0:  10%|▉         | 69/704 [00:31<04:48,  2.20it/s, v_num=11, train_loss=4.600]
Epoch 0:  10%|▉         | 69/704 [00:31<04:52,  2.17it/s, v_num=11, train_loss=4.600]
Epoch 0:  10%|▉         | 70/704 [00:31<04:47,  2.20it/s, v_num=11, train_loss=4.600]
Epoch 0:  10%|▉         | 70/704 [00:32<04:51,  2.18it/s, v_num=11, train_loss=4.600]
Epoch 0:  10%|█         | 71/704 [00:32<04:47,  2.20it/s, v_num=11, train_loss=4.600]
Epoch 0:  10%|█         | 71/704 [00:32<04:50,  2.18it/s, v_num=11, train_loss=4.600]
Epoch 0:  10%|█         | 72/704 [00:32<04:46,  2.21it/s, v_num=11, train_loss=4.600]
Epoch 0:  10%|█         | 72/704 [00:33<04:49,  2.18it/s, v_num=11, train_loss=4.600]
Epoch 0:  10%|█         | 73/704 [00:33<04:45,  2.21it/s, v_num=11, train_loss=4.600]
Epoch 0:  10%|█         | 73/704 [00:33<04:49,  2.18it/s, v_num=11, train_loss=4.600]
Epoch 0:  11%|█         | 74/704 [00:33<04:44,  2.21it/s, v_num=11, train_loss=4.600]
Epoch 0:  11%|█         | 74/704 [00:33<04:48,  2.18it/s, v_num=11, train_loss=4.600]
Epoch 0:  11%|█         | 75/704 [00:33<04:44,  2.21it/s, v_num=11, train_loss=4.600]
Epoch 0:  11%|█         | 75/704 [00:34<04:47,  2.19it/s, v_num=11, train_loss=4.600]
Epoch 0:  11%|█         | 76/704 [00:34<04:44,  2.21it/s, v_num=11, train_loss=4.600]
Epoch 0:  11%|█         | 76/704 [00:34<04:46,  2.19it/s, v_num=11, train_loss=4.600]
Epoch 0:  11%|█         | 77/704 [00:34<04:43,  2.21it/s, v_num=11, train_loss=4.600]
Epoch 0:  11%|█         | 77/704 [00:35<04:46,  2.19it/s, v_num=11, train_loss=4.600]
Epoch 0:  11%|█         | 78/704 [00:35<04:42,  2.21it/s, v_num=11, train_loss=4.600]
Epoch 0:  11%|█         | 78/704 [00:35<04:45,  2.19it/s, v_num=11, train_loss=4.600]
Epoch 0:  11%|█         | 79/704 [00:35<04:42,  2.22it/s, v_num=11, train_loss=4.600]
Epoch 0:  11%|█         | 79/704 [00:36<04:44,  2.19it/s, v_num=11, train_loss=4.600]
Epoch 0:  11%|█▏        | 80/704 [00:36<04:41,  2.22it/s, v_num=11, train_loss=4.600]
Epoch 0:  11%|█▏        | 80/704 [00:36<04:44,  2.20it/s, v_num=11, train_loss=4.600]
Epoch 0:  12%|█▏        | 81/704 [00:36<04:40,  2.22it/s, v_num=11, train_loss=4.600]
Epoch 0:  12%|█▏        | 81/704 [00:36<04:43,  2.20it/s, v_num=11, train_loss=4.600]
Epoch 0:  12%|█▏        | 82/704 [00:36<04:39,  2.22it/s, v_num=11, train_loss=4.600]
Epoch 0:  12%|█▏        | 82/704 [00:37<04:42,  2.20it/s, v_num=11, train_loss=4.600]
Epoch 0:  12%|█▏        | 83/704 [00:37<04:39,  2.22it/s, v_num=11, train_loss=4.600]
Epoch 0:  12%|█▏        | 83/704 [00:37<04:42,  2.20it/s, v_num=11, train_loss=4.600]
Epoch 0:  12%|█▏        | 84/704 [00:37<04:38,  2.22it/s, v_num=11, train_loss=4.600]
Epoch 0:  12%|█▏        | 84/704 [00:38<04:41,  2.20it/s, v_num=11, train_loss=4.600]
Epoch 0:  12%|█▏        | 85/704 [00:38<04:38,  2.22it/s, v_num=11, train_loss=4.600]
Epoch 0:  12%|█▏        | 85/704 [00:38<04:40,  2.20it/s, v_num=11, train_loss=4.600]
Epoch 0:  12%|█▏        | 86/704 [00:38<04:37,  2.23it/s, v_num=11, train_loss=4.600]
Epoch 0:  12%|█▏        | 86/704 [00:39<04:40,  2.20it/s, v_num=11, train_loss=4.600]
Epoch 0:  12%|█▏        | 87/704 [00:39<04:36,  2.23it/s, v_num=11, train_loss=4.600]
Epoch 0:  12%|█▏        | 87/704 [00:39<04:39,  2.21it/s, v_num=11, train_loss=4.600]
Epoch 0:  12%|█▎        | 88/704 [00:39<04:36,  2.23it/s, v_num=11, train_loss=4.600]
Epoch 0:  12%|█▎        | 88/704 [00:39<04:39,  2.21it/s, v_num=11, train_loss=4.600]
Epoch 0:  13%|█▎        | 89/704 [00:39<04:36,  2.23it/s, v_num=11, train_loss=4.600]
Epoch 0:  13%|█▎        | 89/704 [00:40<04:38,  2.21it/s, v_num=11, train_loss=4.600]
Epoch 0:  13%|█▎        | 90/704 [00:40<04:35,  2.23it/s, v_num=11, train_loss=4.600]
Epoch 0:  13%|█▎        | 90/704 [00:40<04:37,  2.21it/s, v_num=11, train_loss=4.600]
Epoch 0:  13%|█▎        | 91/704 [00:40<04:34,  2.23it/s, v_num=11, train_loss=4.600]
Epoch 0:  13%|█▎        | 91/704 [00:41<04:37,  2.21it/s, v_num=11, train_loss=4.600]
Epoch 0:  13%|█▎        | 92/704 [00:41<04:34,  2.23it/s, v_num=11, train_loss=4.600]
Epoch 0:  13%|█▎        | 92/704 [00:41<04:36,  2.21it/s, v_num=11, train_loss=4.600]
Epoch 0:  13%|█▎        | 93/704 [00:41<04:33,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  13%|█▎        | 93/704 [00:42<04:35,  2.21it/s, v_num=11, train_loss=4.600]
Epoch 0:  13%|█▎        | 94/704 [00:42<04:32,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  13%|█▎        | 94/704 [00:42<04:35,  2.22it/s, v_num=11, train_loss=4.600]
Epoch 0:  13%|█▎        | 95/704 [00:42<04:32,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  13%|█▎        | 95/704 [00:42<04:34,  2.22it/s, v_num=11, train_loss=4.600]
Epoch 0:  14%|█▎        | 96/704 [00:42<04:31,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  14%|█▎        | 96/704 [00:43<04:34,  2.22it/s, v_num=11, train_loss=4.600]
Epoch 0:  14%|█▍        | 97/704 [00:43<04:31,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  14%|█▍        | 97/704 [00:43<04:33,  2.22it/s, v_num=11, train_loss=4.600]
Epoch 0:  14%|█▍        | 98/704 [00:43<04:30,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  14%|█▍        | 98/704 [00:44<04:32,  2.22it/s, v_num=11, train_loss=4.600]
Epoch 0:  14%|█▍        | 99/704 [00:44<04:30,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  14%|█▍        | 99/704 [00:44<04:32,  2.22it/s, v_num=11, train_loss=4.600]
Epoch 0:  14%|█▍        | 100/704 [00:44<04:29,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  14%|█▍        | 100/704 [00:45<04:32,  2.22it/s, v_num=11, train_loss=4.600]
Epoch 0:  14%|█▍        | 101/704 [00:45<04:29,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  14%|█▍        | 101/704 [00:45<04:31,  2.22it/s, v_num=11, train_loss=4.600]
Epoch 0:  14%|█▍        | 102/704 [00:45<04:28,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  14%|█▍        | 102/704 [00:45<04:30,  2.22it/s, v_num=11, train_loss=4.600]
Epoch 0:  15%|█▍        | 103/704 [00:45<04:28,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  15%|█▍        | 103/704 [00:46<04:30,  2.22it/s, v_num=11, train_loss=4.600]
Epoch 0:  15%|█▍        | 104/704 [00:46<04:27,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  15%|█▍        | 104/704 [00:46<04:29,  2.22it/s, v_num=11, train_loss=4.600]
Epoch 0:  15%|█▍        | 105/704 [00:46<04:26,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  15%|█▍        | 105/704 [00:47<04:29,  2.23it/s, v_num=11, train_loss=4.600]
Epoch 0:  15%|█▌        | 106/704 [00:47<04:26,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  15%|█▌        | 106/704 [00:47<04:28,  2.23it/s, v_num=11, train_loss=4.600]
Epoch 0:  15%|█▌        | 107/704 [00:47<04:25,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  15%|█▌        | 107/704 [00:48<04:28,  2.23it/s, v_num=11, train_loss=4.600]
Epoch 0:  15%|█▌        | 108/704 [00:48<04:25,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  15%|█▌        | 108/704 [00:48<04:27,  2.23it/s, v_num=11, train_loss=4.600]
Epoch 0:  15%|█▌        | 109/704 [00:48<04:24,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  15%|█▌        | 109/704 [00:48<04:26,  2.23it/s, v_num=11, train_loss=4.600]
Epoch 0:  16%|█▌        | 110/704 [00:48<04:24,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  16%|█▌        | 110/704 [00:49<04:26,  2.23it/s, v_num=11, train_loss=4.600]
Epoch 0:  16%|█▌        | 111/704 [00:49<04:23,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  16%|█▌        | 111/704 [00:49<04:25,  2.23it/s, v_num=11, train_loss=4.600]
Epoch 0:  16%|█▌        | 112/704 [00:49<04:23,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  16%|█▌        | 112/704 [00:50<04:25,  2.23it/s, v_num=11, train_loss=4.600]
Epoch 0:  16%|█▌        | 113/704 [00:50<04:22,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  16%|█▌        | 113/704 [00:50<04:24,  2.23it/s, v_num=11, train_loss=4.600]
Epoch 0:  16%|█▌        | 114/704 [00:50<04:22,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  16%|█▌        | 114/704 [00:51<04:24,  2.23it/s, v_num=11, train_loss=4.600]
Epoch 0:  16%|█▋        | 115/704 [00:51<04:21,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  16%|█▋        | 115/704 [00:51<04:23,  2.23it/s, v_num=11, train_loss=4.600]
Epoch 0:  16%|█▋        | 116/704 [00:51<04:21,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  16%|█▋        | 116/704 [00:51<04:23,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  17%|█▋        | 117/704 [00:51<04:20,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  17%|█▋        | 117/704 [00:52<04:22,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  17%|█▋        | 118/704 [00:52<04:20,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  17%|█▋        | 118/704 [00:52<04:21,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  17%|█▋        | 119/704 [00:52<04:19,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  17%|█▋        | 119/704 [00:53<04:21,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  17%|█▋        | 120/704 [00:53<04:19,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  17%|█▋        | 120/704 [00:53<04:20,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  17%|█▋        | 121/704 [00:53<04:18,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  17%|█▋        | 121/704 [00:54<04:20,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  17%|█▋        | 122/704 [00:54<04:18,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  17%|█▋        | 122/704 [00:54<04:19,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  17%|█▋        | 123/704 [00:54<04:17,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  17%|█▋        | 123/704 [00:54<04:19,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  18%|█▊        | 124/704 [00:54<04:17,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  18%|█▊        | 124/704 [00:55<04:18,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  18%|█▊        | 125/704 [00:55<04:16,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  18%|█▊        | 125/704 [00:55<04:18,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  18%|█▊        | 126/704 [00:55<04:15,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  18%|█▊        | 126/704 [00:56<04:17,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  18%|█▊        | 127/704 [00:56<04:15,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  18%|█▊        | 127/704 [00:56<04:17,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  18%|█▊        | 128/704 [00:56<04:15,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  18%|█▊        | 128/704 [00:57<04:16,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  18%|█▊        | 129/704 [00:57<04:14,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  18%|█▊        | 129/704 [00:57<04:16,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  18%|█▊        | 130/704 [00:57<04:13,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  18%|█▊        | 130/704 [00:57<04:15,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  19%|█▊        | 131/704 [00:57<04:13,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  19%|█▊        | 131/704 [00:58<04:15,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  19%|█▉        | 132/704 [00:58<04:12,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  19%|█▉        | 132/704 [00:58<04:14,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  19%|█▉        | 133/704 [00:58<04:12,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  19%|█▉        | 133/704 [00:59<04:14,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  19%|█▉        | 134/704 [00:59<04:12,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  19%|█▉        | 134/704 [00:59<04:13,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  19%|█▉        | 135/704 [00:59<04:11,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  19%|█▉        | 135/704 [01:00<04:13,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  19%|█▉        | 136/704 [01:00<04:11,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  19%|█▉        | 136/704 [01:00<04:12,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  19%|█▉        | 137/704 [01:00<04:10,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  19%|█▉        | 137/704 [01:00<04:12,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  20%|█▉        | 138/704 [01:01<04:10,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  20%|█▉        | 138/704 [01:01<04:11,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  20%|█▉        | 139/704 [01:01<04:09,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  20%|█▉        | 139/704 [01:01<04:11,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  20%|█▉        | 140/704 [01:01<04:09,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  20%|█▉        | 140/704 [01:02<04:10,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  20%|██        | 141/704 [01:02<04:08,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  20%|██        | 141/704 [01:02<04:10,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  20%|██        | 142/704 [01:02<04:08,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  20%|██        | 142/704 [01:03<04:09,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  20%|██        | 143/704 [01:03<04:07,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  20%|██        | 143/704 [01:03<04:09,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  20%|██        | 144/704 [01:03<04:07,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  20%|██        | 144/704 [01:03<04:08,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  21%|██        | 145/704 [01:04<04:06,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  21%|██        | 145/704 [01:04<04:08,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  21%|██        | 146/704 [01:04<04:06,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  21%|██        | 146/704 [01:04<04:07,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  21%|██        | 147/704 [01:04<04:05,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  21%|██        | 147/704 [01:05<04:07,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  21%|██        | 148/704 [01:05<04:05,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  21%|██        | 148/704 [01:05<04:06,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  21%|██        | 149/704 [01:05<04:04,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  21%|██        | 149/704 [01:06<04:06,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  21%|██▏       | 150/704 [01:06<04:04,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  21%|██▏       | 150/704 [01:06<04:05,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  21%|██▏       | 151/704 [01:06<04:04,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  21%|██▏       | 151/704 [01:07<04:05,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  22%|██▏       | 152/704 [01:07<04:03,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  22%|██▏       | 152/704 [01:07<04:04,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  22%|██▏       | 153/704 [01:07<04:03,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  22%|██▏       | 153/704 [01:07<04:04,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  22%|██▏       | 154/704 [01:07<04:02,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  22%|██▏       | 154/704 [01:08<04:03,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  22%|██▏       | 155/704 [01:08<04:02,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  22%|██▏       | 155/704 [01:08<04:03,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  22%|██▏       | 156/704 [01:08<04:01,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  22%|██▏       | 156/704 [01:09<04:02,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  22%|██▏       | 157/704 [01:09<04:01,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  22%|██▏       | 157/704 [01:09<04:02,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  22%|██▏       | 158/704 [01:09<04:00,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  22%|██▏       | 158/704 [01:10<04:01,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  23%|██▎       | 159/704 [01:10<04:00,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  23%|██▎       | 159/704 [01:10<04:01,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  23%|██▎       | 160/704 [01:10<03:59,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  23%|██▎       | 160/704 [01:10<04:01,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  23%|██▎       | 161/704 [01:10<03:59,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  23%|██▎       | 161/704 [01:11<04:00,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  23%|██▎       | 162/704 [01:11<03:58,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  23%|██▎       | 162/704 [01:11<04:00,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  23%|██▎       | 163/704 [01:11<03:58,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  23%|██▎       | 163/704 [01:12<03:59,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  23%|██▎       | 164/704 [01:12<03:57,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  23%|██▎       | 164/704 [01:12<03:59,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  23%|██▎       | 165/704 [01:12<03:57,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  23%|██▎       | 165/704 [01:13<03:58,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  24%|██▎       | 166/704 [01:13<03:56,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  24%|██▎       | 166/704 [01:13<03:58,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  24%|██▎       | 167/704 [01:13<03:56,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  24%|██▎       | 167/704 [01:13<03:57,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  24%|██▍       | 168/704 [01:13<03:55,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  24%|██▍       | 168/704 [01:14<03:57,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  24%|██▍       | 169/704 [01:14<03:55,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  24%|██▍       | 169/704 [01:14<03:56,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  24%|██▍       | 170/704 [01:14<03:55,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  24%|██▍       | 170/704 [01:15<03:56,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  24%|██▍       | 171/704 [01:15<03:54,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  24%|██▍       | 171/704 [01:15<03:55,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  24%|██▍       | 172/704 [01:15<03:54,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  24%|██▍       | 172/704 [01:16<03:55,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  25%|██▍       | 173/704 [01:16<03:53,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  25%|██▍       | 173/704 [01:16<03:54,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  25%|██▍       | 174/704 [01:16<03:53,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  25%|██▍       | 174/704 [01:16<03:54,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  25%|██▍       | 175/704 [01:16<03:52,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  25%|██▍       | 175/704 [01:17<03:53,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  25%|██▌       | 176/704 [01:17<03:52,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  25%|██▌       | 176/704 [01:17<03:53,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  25%|██▌       | 177/704 [01:17<03:51,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  25%|██▌       | 177/704 [01:18<03:52,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  25%|██▌       | 178/704 [01:18<03:51,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  25%|██▌       | 178/704 [01:18<03:52,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  25%|██▌       | 179/704 [01:18<03:50,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  25%|██▌       | 179/704 [01:19<03:51,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  26%|██▌       | 180/704 [01:19<03:50,  2.27it/s, v_num=11, train_loss=4.600]

everything is not crashing, and the model summary looks good, but
the training loss just doesn't change (different batch sample has a slight change, but not due to training of the model)

Environment

Current environment
#- PyTorch Lightning Version (e.g., 2.4.0): 2.3.3
#- PyTorch Version (e.g., 2.4): 2.3.1
#- Python version (e.g., 3.12): 3.10.14
#- OS (e.g., Linux):  Linux
#- CUDA/cuDNN version: 12.4
#- GPU models and configuration: 3090
#- How you installed Lightning(`conda`, `pip`, source): pip

The collect env script is not working, btw

Traceback (most recent call last):
 
  File "/conda/envs/ai/lib/python3.10/site-packages/pkg_resources/_vendor/pyparsing.py", line 2711, in parseImpl
    raise ParseException(instring, loc, self.errmsg, self)
pkg_resources._vendor.pyparsing.ParseException: Expected W:(abcd...) (at char 0), (line:1, col:1)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
    raise InvalidRequirement(
pkg_resources.extern.packaging.requirements.InvalidRequirement: Parse error at "'-cipy==1'": Expected W:(abcd...)

More info

No response

@2catycm 2catycm added bug Something isn't working needs triage Waiting to be triaged by maintainers labels Oct 15, 2024
@2catycm
Copy link
Author

2catycm commented Oct 15, 2024

My full code is a little bit complicated, but I believe the problem is just within the above logics, did I use Lightning wrong in the above code?

@2catycm
Copy link
Author

2catycm commented Oct 15, 2024

grateful if somrone can give me any idea about what may cause such issue.

i thought about if the cls_model is errorly frozen, but it is not, parameters of it are requires_grad.

@2catycm
Copy link
Author

2catycm commented Oct 15, 2024

may be it is related to this issue #20128

i am also using huggingface's automodel from pretrain, and mode is eval.

i tried to manually called training,but it doesnot work

@2catycm
Copy link
Author

2catycm commented Oct 16, 2024

No, it is not because of that issue. I double checked that I called nn.Module.train() ever since I use AutoModel.from_pretrained.

@2catycm
Copy link
Author

2catycm commented Oct 16, 2024

To debug, I print the parameters and gradients 's L2 norm every time training_step is called. Something interesting happens.

Grad Norm: 0.08665306866168976
Params Norm before step: 771.8257446289062
Params Norm after step: 771.8740234375
Grad Norm: 9.2427133654982e-12
Params Norm before step: 771.8740234375
Params Norm after step: 771.8740234375
Grad Norm: 1.773968298646178e-11
Params Norm before step: 771.8740234375
Params Norm after step: 771.8740234375
Grad Norm: 1.1152222808424872e-12
Params Norm before step: 771.8740234375
Params Norm after step: 771.8740234375
Grad Norm: 6.962481264270737e-13
Params Norm before step: 771.8740234375
Params Norm after step: 771.8740234375
Grad Norm: 1.828729181974076e-11
Params Norm before step: 771.8740234375
Params Norm after step: 771.8740234375

The optimizer indeed made a change to the model, which is self, the L.LightningModule instance. However, the gradient goes to very small somehow.

Can any experts kindly tell me where did I am use wrong of Lightning?

@arijit-hub
Copy link

Hye,

Am not an expert, but I checked your code and you seem to do loss.backward() instead of self.manual_backward(loss) as stated in the documentation (https://lightning.ai/docs/pytorch/stable/model/manual_optimization.html#manual-optimization).

Can you see if this helps?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Waiting to be triaged by maintainers ver: 2.3.x
Projects
None yet
Development

No branches or pull requests

2 participants