You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Former users of Deepspeech, we are now switching to coqui-stt v1.3.0. We mainly perform fine-tuning with home-made technical datasets.
We are facing one "classic" issue with some of our data: "Invalid argument: Not enough time for target transition sequence". The technical reason is more or less understood, we put some watchdog to discard too short audios wrt number of words (computing average time per word of our datasets), but the problem popped-up again while switching to coqui-stt.
We know how to workaround the problem (adding ignore_longer_outputs_than_inputs=True to ctc_loss init in train.py and evaluate.py), the question is how many data are involved? Could it be a bias in our training?
I read this topic and tried the proposed code, but it doesn't work anymore.
Could you help with that? How can I identify which data of my dataset is invalid?
Many thanks in advance.
Fabien.
Full stack:
I Performing dummy training to check for memory problems.
I If the following process crashes, you likely have batch sizes that are too big for your available system memory (or GPU memory).
I Loading best validating checkpoint from /data/220415_1250/checkpoint/best_dev-118584
I Loading variable from checkpoint: beta1_power
I Loading variable from checkpoint: beta2_power
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel/Adam
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel/Adam_1
I Loading variable from checkpoint: global_step
I Loading variable from checkpoint: layer_1/bias
I Loading variable from checkpoint: layer_1/bias/Adam
I Loading variable from checkpoint: layer_1/bias/Adam_1
I Loading variable from checkpoint: layer_1/weights
I Loading variable from checkpoint: layer_1/weights/Adam
I Loading variable from checkpoint: layer_1/weights/Adam_1
I Loading variable from checkpoint: layer_2/bias
I Loading variable from checkpoint: layer_2/bias/Adam
I Loading variable from checkpoint: layer_2/bias/Adam_1
I Loading variable from checkpoint: layer_2/weights
I Loading variable from checkpoint: layer_2/weights/Adam
I Loading variable from checkpoint: layer_2/weights/Adam_1
I Loading variable from checkpoint: layer_3/bias
I Loading variable from checkpoint: layer_3/bias/Adam
I Loading variable from checkpoint: layer_3/bias/Adam_1
I Loading variable from checkpoint: layer_3/weights
I Loading variable from checkpoint: layer_3/weights/Adam
I Loading variable from checkpoint: layer_3/weights/Adam_1
I Loading variable from checkpoint: layer_5/bias
I Loading variable from checkpoint: layer_5/bias/Adam
I Loading variable from checkpoint: layer_5/bias/Adam_1
I Loading variable from checkpoint: layer_5/weights
I Loading variable from checkpoint: layer_5/weights/Adam
I Loading variable from checkpoint: layer_5/weights/Adam_1
I Loading variable from checkpoint: layer_6/bias
I Loading variable from checkpoint: layer_6/bias/Adam
I Loading variable from checkpoint: layer_6/bias/Adam_1
I Loading variable from checkpoint: layer_6/weights
I Loading variable from checkpoint: layer_6/weights/Adam
I Loading variable from checkpoint: layer_6/weights/Adam_1
I Loading variable from checkpoint: learning_rate
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000
Epoch 0 | Training | Elapsed Time: 0:00:05 | Steps: 1 | Loss: 184.835358
Epoch 0 | Training | Elapsed Time: 0:00:05 | Steps: 1 | Loss: 184.835358
Epoch 0 | Validation | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000 | Dataset: /data/dataset_val.csv
Epoch 0 | Validation | Elapsed Time: 0:00:03 | Steps: 1 | Loss: 191.520050 | Dataset: /data/datasets/dataset_val.csv
Epoch 0 | Validation | Elapsed Time: 0:00:03 | Steps: 1 | Loss: 191.520050 | Dataset: /data/datasets/dataset_val.csv
--------------------------------------------------------------------------------
I FINISHED optimization in 0:00:09.462788
I Dummy run finished without problems, now starting real training process.
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000
Epoch 0 | Training | Elapsed Time: 0:00:04 | Steps: 1 | Loss: 184.835358
Epoch 0 | Training | Elapsed Time: 0:00:06 | Steps: 2 | Loss: 170.022476
Epoch 0 | Training | Elapsed Time: 0:00:08 | Steps: 3 | Loss: 165.726517
Epoch 0 | Training | Elapsed Time: 0:00:09 | Steps: 4 | Loss: 155.419304
Epoch 0 | Training | Elapsed Time: 0:00:11 | Steps: 5 | Loss: 150.382837
Epoch 0 | Training | Elapsed Time: 0:00:13 | Steps: 6 | Loss: 145.862333
Epoch 0 | Training | Elapsed Time: 0:00:15 | Steps: 7 | Loss: 142.092943
Epoch 0 | Training | Elapsed Time: 0:00:16 | Steps: 8 | Loss: 139.344593
Epoch 0 | Training | Elapsed Time: 0:00:18 | Steps: 9 | Loss: 139.167085
Epoch 0 | Training | Elapsed Time: 0:00:20 | Steps: 10 | Loss: 135.617424
Epoch 0 | Training | Elapsed Time: 0:00:22 | Steps: 11 | Loss: 133.399096
Epoch 0 | Training | Elapsed Time: 0:00:24 | Steps: 12 | Loss: 132.209274
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1349, in _run_fn
return self._call_tf_sessionrun(options, feed_dict, fetch_list,
File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1441, in _call_tf_sessionrun
return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Not enough time for target transition sequence (required: 40, available: 39)23You can turn this error into a warning by using the flag ignore_longer_outputs_than_inputs
[[{{node tower_0/CTCLoss}}]]
[[tower_0/CTCLoss/_89]]
(1) Invalid argument: Not enough time for target transition sequence (required: 40, available: 39)23You can turn this error into a warning by using the flag ignore_longer_outputs_than_inputs
[[{{node tower_0/CTCLoss}}]]
0 successful operations.
1 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "home/xxxx/Coqui-STT/training/coqui_stt_training/train.py", line 726, in <module>
main()
File "home/xxxx/Coqui-STT/training/coqui_stt_training/train.py", line 696, in main
train()
File "home/xxxx/Coqui-STT/training/coqui_stt_training/train.py", line 335, in train
train_impl(epochs=Config.epochs, silent_load=True)
File "home/xxxx/Coqui-STT/training/coqui_stt_training/train.py", line 569, in train_impl
train_loss, _ = run_set("train", epoch, train_init_op)
File "home/xxxx/Coqui-STT/training/coqui_stt_training/train.py", line 515, in run_set
) = session.run(
File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 955, in run
result = self._run(None, fetches, feed_dict, options_ptr,
File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1179, in _run
results = self._do_run(handle, final_targets, final_fetches,
File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1358, in _do_run
return self._do_call(_run_fn, feeds, fetches, targets, options,
File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Not enough time for target transition sequence (required: 40, available: 39)23You can turn this error into a warning by using the flag ignore_longer_outputs_than_inputs
[[node tower_0/CTCLoss (defined at usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
[[tower_0/CTCLoss/_89]]
(1) Invalid argument: Not enough time for target transition sequence (required: 40, available: 39)23You can turn this error into a warning by using the flag ignore_longer_outputs_than_inputs
[[node tower_0/CTCLoss (defined at usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
0 successful operations.
1 derived errors ignored.
Original stack trace for 'tower_0/CTCLoss':
File "usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "home/xxxx/Coqui-STT/training/coqui_stt_training/train.py", line 726, in <module>
main()
File "home/xxxx/Coqui-STT/training/coqui_stt_training/train.py", line 696, in main
train()
File "home/xxxx/Coqui-STT/training/coqui_stt_training/train.py", line 335, in train
train_impl(epochs=Config.epochs, silent_load=True)
File "home/xxxx/Coqui-STT/training/coqui_stt_training/train.py", line 392, in train_impl
gradients, loss, non_finite_files = get_tower_results(
File "home/xxxx/Coqui-STT/training/coqui_stt_training/train.py", line 172, in get_tower_results
avg_loss, non_finite_files = calculate_mean_edit_distance_and_loss(
File "home/xxxx/Coqui-STT/training/coqui_stt_training/train.py", line 95, in calculate_mean_edit_distance_and_loss
total_loss = tfv1.nn.ctc_loss(
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/ops/ctc_ops.py", line 159, in ctc_loss
return _ctc_loss_impl(labels, inputs, sequence_length,
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/ops/ctc_ops.py", line 195, in _ctc_loss_impl
loss, _ = ctc_loss_func(
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/ops/gen_ctc_ops.py", line 329, in ctc_loss
_, _, _op = _op_def_lib._apply_op_helper(
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/op_def_library.py", line 792, in _apply_op_helper
op = g.create_op(op_type_name, inputs, dtypes=None, name=scope,
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/util/deprecation.py", line 513, in new_func
return func(*args, **kwargs)
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py", line 3356, in create_op
return self._create_op_internal(op_type, inputs, dtypes, input_types, name,
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py", line 3418, in _create_op_internal
ret = Operation(
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
self._traceback = tf_stack.extract_stack()
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Dear Coqui Community,
Former users of Deepspeech, we are now switching to coqui-stt v1.3.0. We mainly perform fine-tuning with home-made technical datasets.
We are facing one "classic" issue with some of our data: "Invalid argument: Not enough time for target transition sequence". The technical reason is more or less understood, we put some watchdog to discard too short audios wrt number of words (computing average time per word of our datasets), but the problem popped-up again while switching to coqui-stt.
We know how to workaround the problem (adding
ignore_longer_outputs_than_inputs=True
to ctc_loss init intrain.py
andevaluate.py
), the question is how many data are involved? Could it be a bias in our training?I read this topic and tried the proposed code, but it doesn't work anymore.
Could you help with that? How can I identify which data of my dataset is invalid?
Many thanks in advance.
Fabien.
Full stack:
Beta Was this translation helpful? Give feedback.
All reactions