NotImplementedError for iterator in IterableData class while debugging CodonTransformer finetuning #17

Cauwth · 2024-12-25T03:32:58Z

I am trying to implement a subclass of IterableData to iterate over a JSON file to finetuen the model, but I am encountering an error. The IterableData class has an abstract method iterator that is supposed to be implemented in subclasses. However, I am unsure how to correctly implement the iterator method in my IterableJSONData class.

I did not use the SLURM

train_data = IterableJSONData(args.dataset_dir)

and the error is like this :
Exception has occurred: NotImplementedError Caught NotImplementedError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/tianhao/miniconda3/envs/CodonTransformer/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 291, in _worker_loop fetcher = _DatasetKind.create_fetcher( File "/home/tianhao/miniconda3/envs/CodonTransformer/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 81, in create_fetcher return _utils.fetch._IterableDatasetFetcher( File "/home/tianhao/miniconda3/envs/CodonTransformer/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 22, in __init__ self.dataset_iter = iter(dataset) File "/data/wth/plant_protein/CodonTransformer/CodonTransformer/CodonUtils.py", line 541, in __iter__ return itertools.islice(self.iterator, worker_rk, None, worker_nb) File "/data/wth/plant_protein/CodonTransformer/CodonTransformer/CodonUtils.py", line 517, in iterator raise NotImplementedError NotImplementedError
How should I implement the iterator method in the IterableJSONData subclass to properly read the JSON file line by line and handle multi-processing environments?
I have tried code like this

to the CLASS IterableJSONData

but got another error

Exception has occurred: ValueError Expected positive integer total_steps, but got -1 File "/data/wth/plant_protein/CodonTransformer/finetune.py", line 87, in configure_optimizers "scheduler": torch.optim.lr_scheduler.OneCycleLR( File "/data/wth/plant_protein/CodonTransformer/finetune.py", line 167, in main trainer.fit(harnessed_model, data_loader) File "/data/wth/plant_protein/CodonTransformer/finetune.py", line 231, in <module> main(args) ValueError: Expected positive integer total_steps, but got -1

The text was updated successfully, but these errors were encountered:

gui11aume · 2024-12-25T15:19:41Z

Hi @Cauwth and thanks for raising the issue. It looks to me like some part of the code is missing. The code was taken from this repo, where IterableJSONData overwrites the iterator method to implement it. Would you try to replace the code with the one I pointed to and see if it works out of the box?

Cauwth · 2024-12-26T02:03:58Z

Thank you for your suggestion! I have updated the iterator method in IterableJSONData based on the recommended repository. The updated code is as follows:

In debug mode, it seems that the data reading process is working correctly. However, the error still occurs in:

I was wondering: should I manually compute the total_steps value?

gui11aume · 2024-12-27T00:19:19Z

Yes, exactly! An iterable dataset is just a stream, so there is no way for the data loader to know how many steps there are. It's not always an issue; you can train until the stream is exhausted, but with a learning-rate scheduler, you need to specify a number of steps so that is knows when to warm up and when to decay. It's just a matter of specifying the value of total_steps in your case.
You can compute it from the number of examples you have (n_examples), the batch size (batch_size), the number of GPUs (n_gpus) and gradient_accumulation as n_examples / (batch_size * n_gpus * gradient_accumulation). If I remember correctly you need to divide by gradient_accumulation because the learning rate is updated only in stepping batches, i.e., batches where the back propagation is computed.

Cauwth · 2024-12-28T10:27:56Z

It works.Thank you so much!

gui11aume · 2024-12-28T15:12:17Z

Thank you for raising the issue! We were not aware that there was a problem with the code. I will reopen the issue until we fix the code.

gui11aume · 2024-12-28T15:13:28Z

@Adibvafa Can you prepare a pull request to fix issue #17?

Cauwth · 2024-12-29T01:56:54Z

After setting total_steps, the code does run, but sometimes the actual training steps exceed the maximum predefined steps. This might be because the batch_size cannot evenly divide the dataset. I couldn’t find a way to resolve this issue, so I had to set total_steps to a value much larger than the calculated expected value.

Adibvafa · 2025-01-04T02:24:43Z

I will work on this over weekend. Thank you for opening this issue!

Cauwth closed this as completed Dec 28, 2024

gui11aume reopened this Dec 28, 2024

Adibvafa self-assigned this Jan 4, 2025

Adibvafa added the bug Something isn't working label Jan 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NotImplementedError for iterator in IterableData class while debugging CodonTransformer finetuning #17

NotImplementedError for iterator in IterableData class while debugging CodonTransformer finetuning #17

Cauwth commented Dec 25, 2024 •

edited

Loading

gui11aume commented Dec 25, 2024

Cauwth commented Dec 26, 2024 •

edited

Loading

gui11aume commented Dec 27, 2024

Cauwth commented Dec 28, 2024

gui11aume commented Dec 28, 2024

gui11aume commented Dec 28, 2024

Cauwth commented Dec 29, 2024

Adibvafa commented Jan 4, 2025

NotImplementedError for iterator in IterableData class while debugging CodonTransformer finetuning #17

NotImplementedError for iterator in IterableData class while debugging CodonTransformer finetuning #17

Comments

Cauwth commented Dec 25, 2024 • edited Loading

gui11aume commented Dec 25, 2024

Cauwth commented Dec 26, 2024 • edited Loading

gui11aume commented Dec 27, 2024

Cauwth commented Dec 28, 2024

gui11aume commented Dec 28, 2024

gui11aume commented Dec 28, 2024

Cauwth commented Dec 29, 2024

Adibvafa commented Jan 4, 2025

Cauwth commented Dec 25, 2024 •

edited

Loading

Cauwth commented Dec 26, 2024 •

edited

Loading