Skip to content
This repository has been archived by the owner on Feb 9, 2023. It is now read-only.

Negative loss value #22

Open
st-tomic opened this issue Apr 5, 2020 · 7 comments
Open

Negative loss value #22

st-tomic opened this issue Apr 5, 2020 · 7 comments
Labels
bug Something isn't working

Comments

@st-tomic
Copy link

st-tomic commented Apr 5, 2020

Hi,

I am trying to run the sample training on a librispeech clean 100h.

After few hours of training with batch size=10 the printed loss value becomes negative.
It happens in the first epoch.

The thing i changed is read_audio function to use soundfile for reading flac files insted of waves with wavfile.read. Although both give the same output when reading files so it shouldn't make a difference.

Are you familiar with the issue? Loss seems to decrease too fast.
Any guess what is going wrong?

@rolczynski
Copy link
Owner

rolczynski commented Apr 5, 2020

Hey! Super weird. Could you provide more details?

@st-tomic
Copy link
Author

st-tomic commented Apr 6, 2020

I agree :)

The only changes are stated above. Used pipeline in basic.py and readme page.
I have also tried it with tfv2.0-GPU and the loss is also decreasing extremely fast within 1st epoch which doesn't seem real.

I only modified audio loading part to use soundfile and used future-fstrings to support f strings on Python 3.5.
Other than that, nothing is changed from your repo.

The loss is frozen at -0.6921 in the last try.

Data loaded from librispeech:

dataset = asr.dataset.Audio.from_csv('examples/libri-100.csv', batch_size=10)
dev_dataset = asr.dataset.Audio.from_csv('examples/dev-clean.csv', batch_size=10)
test_dataset = asr.dataset.Audio.from_csv('examples/test-clean.csv')

@HunbeomBak
Copy link

HunbeomBak commented Apr 6, 2020

i have a same problem.

i tried training my dataset.

`
You do not need to update to CUDA 9.2.88; cherry-picking the ptxas binary is sufficient.
475/476 [============================>.] - ETA: 1s - loss: 2.6582

476/476 [==============================] - 802s 2s/step - loss: 2.6511 - val_loss: -0.6931
Epoch 2/5
119/476 [======>.......................] - ETA: 5:47 - loss: -0.6931
`

After first epoch, val_loss was negative.

and second epoch also had negative loss.
the loss does not change, and remains at -0.6931 in second epoch.

i use your environment-gpu.yml for creating a conda environments.

my english skill is bad, but i did my best.

@vmarkiNN
Copy link

vmarkiNN commented Apr 7, 2020

Hi, I had the same problem. For me, it turned out that the pipeline.fit() method returns an empty string instead of a correct transcript, so a model learns to predict it. I used a following code and it works:

dataset =pipeline.wrap_preprocess(dataset, False, None)
y = tf.keras.layers.Input(name='y', shape=[None], dtype='int32')
loss = pipeline.get_loss()
pipeline._model.compile(pipeline._optimizer, loss, target_tensors=[y])
pipeline._model.fit(dataset,epochs=20)

@rolczynski rolczynski added the bug Something isn't working label Apr 8, 2020
@askinucuncu
Copy link

I guess there is no improvement in this regard. Because the system still produces negative values.

@wenjingyang
Copy link

wenjingyang commented Mar 9, 2021

I had similar issue even if I just tried example(basic.py) on tf v2.1. I think the negative loss value might be acceptable. keras-team/keras#9369

But when I used the predict to predict the test.csv (same as training file), the output is empty.['']. It doesn't look reasonable now.
code in basic.py : pipeline.predict(data)

Epoch 1/5
1/1 [==============================] - 3s 3s/step - loss: 303.5161
1/1 [==============================] - 19s 19s/step - loss: 610.6132 - val_loss: 303.5161
Epoch 2/5
Epoch 1/5
1/1 [==============================] - 1s 1s/step - loss: 61.1079
1/1 [==============================] - 7s 7s/step - loss: 76.0996 - val_loss: 61.1079
Epoch 3/5
Epoch 1/5
1/1 [==============================] - 1s 1s/step - loss: 9.9619
1/1 [==============================] - 7s 7s/step - loss: 4.4410 - val_loss: 9.9619
Epoch 4/5
Epoch 1/5
1/1 [==============================] - 1s 1s/step - loss: 2.4088
1/1 [==============================] - 7s 7s/step - loss: 0.6229 - val_loss: 2.4088
Epoch 5/5
Epoch 1/5
1/1 [==============================] - 1s 1s/step - loss: 0.5115
1/1 [==============================] - 7s 7s/step - loss: -0.3944 - val_loss: 0.5115

@wenjingyang

This comment has been minimized.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants