Skip to content
This repository has been archived by the owner on Feb 9, 2023. It is now read-only.

TensorFlow multi_gpu_model function is deprecated #24

Open
HunbeomBak opened this issue Apr 8, 2020 · 3 comments
Open

TensorFlow multi_gpu_model function is deprecated #24

HunbeomBak opened this issue Apr 8, 2020 · 3 comments
Labels
bug Something isn't working

Comments

@HunbeomBak
Copy link

HunbeomBak commented Apr 8, 2020

Hello, i want to train my dataset, and i have two gpus.

bellwo is code.

`pipeline = asr.pipeline.CTCPipeline(
alphabet, features_extractor, model, optimizer, decoder, gpus=['gpu:0','gpu:1']
)
dataset =pipeline.wrap_preprocess(dataset, False, None)
dev_dataset =pipeline.wrap_preprocess(dev_dataset, False, None)

y = tf.keras.layers.Input(name='y', shape=[None], dtype='int32')
loss = pipeline.get_loss()
pipeline._model.compile(pipeline._optimizer, loss, target_tensors=[y])
pipeline._model.fit(dataset,validation_data=dev_dataset,epochs=100)
pipeline._model.save(os.path.join('/checkpoint', 'model.h5'))`

But, model use only one gpu.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.44 Driver Version: 440.44 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN Xp Off | 00000000:67:00.0 Off | N/A |
| 48% 78C P2 220W / 250W | 11861MiB / 12196MiB | 86% Default |
+-------------------------------+----------------------+----------------------+
| 1 TITAN Xp Off | 00000000:68:00.0 On | N/A |
| 27% 45C P8 12W / 250W | 574MiB / 12194MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

It seems that an OOM occurs when the batch size is increased.

@rolczynski
Copy link
Owner

Hey @HunbeomBak

Alala... we have to fix it because the multi_gpu_model function is deprecated on 1 April, more details here.

As you can see in this link, we have to use the MirrorStrategy in our case. If you want to help, please make the Pull Request. Otherwise, the hotfix is to create the model within the context of the strategy. Please read the detailed tutorial directly on the TensorFlow page here

Please tell me if you can fix it.

@rolczynski rolczynski changed the title How can i use multi gpus TensorFlow multi_gpu_model function is deprecated Apr 8, 2020
@rolczynski rolczynski added the bug Something isn't working label Apr 8, 2020
@djo-koconi
Copy link

Hi @rolczynski
Seems that the usage of target_tensors option when compiling the model is not supported in tf MirrorStrategy.

Do you have any idea how can usage of target_tensors option can be avoided and model mapped to the right target automatically?

In my experiments, just disabling target_tensors option throws a ctc loss related error when compiling the model. This is the snippet I am mentioning:

def compile_model(self):
""" The compiled model means the model configured for training. """
y = keras.layers.Input(name='y', shape=[None], dtype='int32')
loss = self.get_loss()
self._model.compile(self._optimizer, loss, target_tensors=[y])
logger.info("Model is successfully compiled")

@rolczynski
Copy link
Owner

It's quite a big change. As we can see, things tend to be more and more complicated when we try to stick with the "functional API". We wish to build models effortlessly so I think we should do it by subclassing the tf.keras.Model class.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants