How to identify data that generates "Invalid argument: Not enough time for target transition sequence" #2188

Craya · 2022-04-15T13:37:39Z

Craya
Apr 15, 2022

Dear Coqui Community,

Former users of Deepspeech, we are now switching to coqui-stt v1.3.0. We mainly perform fine-tuning with home-made technical datasets.

We are facing one "classic" issue with some of our data: "Invalid argument: Not enough time for target transition sequence". The technical reason is more or less understood, we put some watchdog to discard too short audios wrt number of words (computing average time per word of our datasets), but the problem popped-up again while switching to coqui-stt.

We know how to workaround the problem (adding ignore_longer_outputs_than_inputs=True to ctc_loss init in train.py and evaluate.py), the question is how many data are involved? Could it be a bias in our training?

I read this topic and tried the proposed code, but it doesn't work anymore.

Could you help with that? How can I identify which data of my dataset is invalid?

Many thanks in advance.

Fabien.

Full stack:

I Performing dummy training to check for memory problems.
I If the following process crashes, you likely have batch sizes that are too big for your available system memory (or GPU memory).
I Loading best validating checkpoint from /data/220415_1250/checkpoint/best_dev-118584
I Loading variable from checkpoint: beta1_power
I Loading variable from checkpoint: beta2_power
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel/Adam
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel/Adam_1
I Loading variable from checkpoint: global_step
I Loading variable from checkpoint: layer_1/bias
I Loading variable from checkpoint: layer_1/bias/Adam
I Loading variable from checkpoint: layer_1/bias/Adam_1
I Loading variable from checkpoint: layer_1/weights
I Loading variable from checkpoint: layer_1/weights/Adam
I Loading variable from checkpoint: layer_1/weights/Adam_1
I Loading variable from checkpoint: layer_2/bias
I Loading variable from checkpoint: layer_2/bias/Adam
I Loading variable from checkpoint: layer_2/bias/Adam_1
I Loading variable from checkpoint: layer_2/weights
I Loading variable from checkpoint: layer_2/weights/Adam
I Loading variable from checkpoint: layer_2/weights/Adam_1
I Loading variable from checkpoint: layer_3/bias
I Loading variable from checkpoint: layer_3/bias/Adam
I Loading variable from checkpoint: layer_3/bias/Adam_1
I Loading variable from checkpoint: layer_3/weights
I Loading variable from checkpoint: layer_3/weights/Adam
I Loading variable from checkpoint: layer_3/weights/Adam_1
I Loading variable from checkpoint: layer_5/bias
I Loading variable from checkpoint: layer_5/bias/Adam
I Loading variable from checkpoint: layer_5/bias/Adam_1
I Loading variable from checkpoint: layer_5/weights
I Loading variable from checkpoint: layer_5/weights/Adam
I Loading variable from checkpoint: layer_5/weights/Adam_1
I Loading variable from checkpoint: layer_6/bias
I Loading variable from checkpoint: layer_6/bias/Adam
I Loading variable from checkpoint: layer_6/bias/Adam_1
I Loading variable from checkpoint: layer_6/weights
I Loading variable from checkpoint: layer_6/weights/Adam
I Loading variable from checkpoint: layer_6/weights/Adam_1
I Loading variable from checkpoint: learning_rate
I STARTING Optimization
Epoch 0 |   Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000
Epoch 0 |   Training | Elapsed Time: 0:00:05 | Steps: 1 | Loss: 184.835358
Epoch 0 |   Training | Elapsed Time: 0:00:05 | Steps: 1 | Loss: 184.835358
Epoch 0 | Validation | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000 | Dataset: /data/dataset_val.csv
Epoch 0 | Validation | Elapsed Time: 0:00:03 | Steps: 1 | Loss: 191.520050 | Dataset: /data/datasets/dataset_val.csv
Epoch 0 | Validation | Elapsed Time: 0:00:03 | Steps: 1 | Loss: 191.520050 | Dataset: /data/datasets/dataset_val.csv
--------------------------------------------------------------------------------
I FINISHED optimization in 0:00:09.462788
I Dummy run finished without problems, now starting real training process.
I STARTING Optimization
Epoch 0 |   Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000
Epoch 0 |   Training | Elapsed Time: 0:00:04 | Steps: 1 | Loss: 184.835358
Epoch 0 |   Training | Elapsed Time: 0:00:06 | Steps: 2 | Loss: 170.022476
Epoch 0 |   Training | Elapsed Time: 0:00:08 | Steps: 3 | Loss: 165.726517
Epoch 0 |   Training | Elapsed Time: 0:00:09 | Steps: 4 | Loss: 155.419304
Epoch 0 |   Training | Elapsed Time: 0:00:11 | Steps: 5 | Loss: 150.382837
Epoch 0 |   Training | Elapsed Time: 0:00:13 | Steps: 6 | Loss: 145.862333
Epoch 0 |   Training | Elapsed Time: 0:00:15 | Steps: 7 | Loss: 142.092943
Epoch 0 |   Training | Elapsed Time: 0:00:16 | Steps: 8 | Loss: 139.344593
Epoch 0 |   Training | Elapsed Time: 0:00:18 | Steps: 9 | Loss: 139.167085
Epoch 0 |   Training | Elapsed Time: 0:00:20 | Steps: 10 | Loss: 135.617424
Epoch 0 |   Training | Elapsed Time: 0:00:22 | Steps: 11 | Loss: 133.399096
Epoch 0 |   Training | Elapsed Time: 0:00:24 | Steps: 12 | Loss: 132.209274
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1349, in _run_fn
    return self._call_tf_sessionrun(options, feed_dict, fetch_list,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1441, in _call_tf_sessionrun
    return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: Not enough time for target transition sequence (required: 40, available: 39)23You can turn this error into a warning by using the flag ignore_longer_outputs_than_inputs
         [[{{node tower_0/CTCLoss}}]]
         [[tower_0/CTCLoss/_89]]
  (1) Invalid argument: Not enough time for target transition sequence (required: 40, available: 39)23You can turn this error into a warning by using the flag ignore_longer_outputs_than_inputs
         [[{{node tower_0/CTCLoss}}]]
0 successful operations.
1 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "home/xxxx/Coqui-STT/training/coqui_stt_training/train.py", line 726, in <module>
    main()
  File "home/xxxx/Coqui-STT/training/coqui_stt_training/train.py", line 696, in main
    train()
  File "home/xxxx/Coqui-STT/training/coqui_stt_training/train.py", line 335, in train
    train_impl(epochs=Config.epochs, silent_load=True)
  File "home/xxxx/Coqui-STT/training/coqui_stt_training/train.py", line 569, in train_impl
    train_loss, _ = run_set("train", epoch, train_init_op)
  File "home/xxxx/Coqui-STT/training/coqui_stt_training/train.py", line 515, in run_set
    ) = session.run(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 955, in run
    result = self._run(None, fetches, feed_dict, options_ptr,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1179, in _run
    results = self._do_run(handle, final_targets, final_fetches,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1358, in _do_run
    return self._do_call(_run_fn, feeds, fetches, targets, options,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: Not enough time for target transition sequence (required: 40, available: 39)23You can turn this error into a warning by using the flag ignore_longer_outputs_than_inputs
         [[node tower_0/CTCLoss (defined at usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
         [[tower_0/CTCLoss/_89]]
  (1) Invalid argument: Not enough time for target transition sequence (required: 40, available: 39)23You can turn this error into a warning by using the flag ignore_longer_outputs_than_inputs
         [[node tower_0/CTCLoss (defined at usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
0 successful operations.
1 derived errors ignored.

Original stack trace for 'tower_0/CTCLoss':
  File "usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "home/xxxx/Coqui-STT/training/coqui_stt_training/train.py", line 726, in <module>
    main()
  File "home/xxxx/Coqui-STT/training/coqui_stt_training/train.py", line 696, in main
    train()
  File "home/xxxx/Coqui-STT/training/coqui_stt_training/train.py", line 335, in train
    train_impl(epochs=Config.epochs, silent_load=True)
  File "home/xxxx/Coqui-STT/training/coqui_stt_training/train.py", line 392, in train_impl
    gradients, loss, non_finite_files = get_tower_results(
  File "home/xxxx/Coqui-STT/training/coqui_stt_training/train.py", line 172, in get_tower_results
    avg_loss, non_finite_files = calculate_mean_edit_distance_and_loss(
  File "home/xxxx/Coqui-STT/training/coqui_stt_training/train.py", line 95, in calculate_mean_edit_distance_and_loss
    total_loss = tfv1.nn.ctc_loss(
  File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/ops/ctc_ops.py", line 159, in ctc_loss
    return _ctc_loss_impl(labels, inputs, sequence_length,
  File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/ops/ctc_ops.py", line 195, in _ctc_loss_impl
    loss, _ = ctc_loss_func(
  File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/ops/gen_ctc_ops.py", line 329, in ctc_loss
    _, _, _op = _op_def_lib._apply_op_helper(
  File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/op_def_library.py", line 792, in _apply_op_helper
    op = g.create_op(op_type_name, inputs, dtypes=None, name=scope,
  File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/util/deprecation.py", line 513, in new_func
    return func(*args, **kwargs)
  File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py", line 3356, in create_op
    return self._create_op_internal(op_type, inputs, dtypes, input_types, name,
  File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py", line 3418, in _create_op_internal
    ret = Operation(
  File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to identify data that generates "Invalid argument: Not enough time for target transition sequence" #2188

{{title}}

Replies: 0 comments

Select a reply

How to identify data that generates "Invalid argument: Not enough time for target transition sequence" #2188

Craya Apr 15, 2022

Replies: 0 comments

Craya
Apr 15, 2022