Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changed asr models outputs to be consistent #11818

Merged
merged 13 commits into from
Feb 12, 2025

Conversation

Ssofja
Copy link
Collaborator

@Ssofja Ssofja commented Jan 10, 2025

What does this PR do ?

This PR is making all ASR models outputs to be consistent and always user can expect List[Hypothesis] by default.

Collection: ASR

Changelog

  • Changed ASR Models (RNNT, CTC,RNNT/CTC hybrid and AED) outputs.
    • List[Hypothesis]
    • If the type of hypotheses is NBestHypotheses, models return List[List[Hypothesis]]
  • Renamed Timestep to Timestamp in the case of these models
  • Added some changes in codes based on the first bullet point

Usage

From command-line

with transcribe_speech.py script

python transcribe_speech.py pretrained_name="model_name" \
dataset_manifest=<manifest_path> \
output_filename=<output_filename> 

From Python Env

The change is done for CTC, RNNT and AED models

Previous version

Where best_hypotheses is List[str], all_hypotheses is List[List[str]]
And in the case when return_hypotheses is True it returns a tuple of List[Hypothesis] and Optional(List[List[Hypothesis]])

from nemo.collections.asr.models import ASRModel
model = ASRModel.from_pretrained('<model_name>')
best_hypotheses, all_hypotheses = model.transcribe(['<file_path>'], return_hypotheses=False)

New version

where hypotheses is List[Hypothesis] or if hypothesis type is NBestHypotheses, the function will return List[List[Hypothesis]]. In hypotheses here only y_sequence, score, text variables are set, all other variables are default values
If return_hypotheses is True all variables of hypotheses are set

from nemo.collections.asr.models import ASRModel
model = ASRModel.from_pretrained('<model_name>')
hypotheses = model.transcribe(['<file_path>'], return_hypotheses=False) 
Each output contains hypothesis object and user can get text, timestamp and other arguments from Hypothesis data class:
It is important to remember, that in previous versions transcribe method had an argument timestep which was renamed to timestamp

output = model.transcribe(['<file_path>']) 
text = output[0].text
timestamp = output[0].timestamp #-> but this returns [] as timestamps were not requested

#Another case is 

output = model.transcribe(['<file_path>'], timestamps=True) 
text = output[0].text
timestamp = output[0].timestamp

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

PR Type:

  • New Feature
  • Bugfix
  • Documentation
  • Refactoring

@Ssofja Ssofja force-pushed the asr_models_output_consistency branch from ed59fd2 to 3ca6a8a Compare January 15, 2025 12:43
@Ssofja Ssofja added Run CICD and removed Run CICD labels Jan 15, 2025
@Ssofja Ssofja force-pushed the asr_models_output_consistency branch from 3ca6a8a to 4232b64 Compare January 17, 2025 13:51
@Ssofja Ssofja added Run CICD and removed Run CICD labels Jan 17, 2025
@nithinraok
Copy link
Collaborator

nithinraok commented Jan 23, 2025

Overall looks good! Thank you.
Changes required:

  1. rename return_hypotheses -> return_all_hypotheses
  2. move from draft to ready
  3. Make sure all variables from Hypotheses return None if not set previously using return_all_hypotheses
  4. update timestep to timestamp in Hypothesis object (used with timestamps=True feature)
  5. Update docs with this feature update
  6. Update PR changelog to explain changes and usage as similar to Timestamps to transcribe #10950

@Ssofja Ssofja marked this pull request as ready for review February 6, 2025 17:24
@Ssofja Ssofja force-pushed the asr_models_output_consistency branch from ee896e4 to 8549980 Compare February 6, 2025 18:33
nithinraok
nithinraok previously approved these changes Feb 6, 2025
Copy link
Collaborator

@nithinraok nithinraok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments, otherwise LGTM

nemo/collections/asr/parts/mixins/transcription.py Outdated Show resolved Hide resolved
nemo/collections/asr/parts/mixins/transcription.py Outdated Show resolved Hide resolved
nemo/collections/asr/parts/mixins/transcription.py Outdated Show resolved Hide resolved
tutorials/asr/ASR_Context_Biasing.ipynb Outdated Show resolved Hide resolved
Copy link
Collaborator

@titu1994 titu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only glanced at it but this PR is full of breaking changes. Upto @nithinraok if he's ok with it, I don't see the reason for most of these naming changes.

nemo/collections/asr/metrics/bleu.py Outdated Show resolved Hide resolved
nemo/collections/asr/parts/mixins/transcription.py Outdated Show resolved Hide resolved
@ko3n1g ko3n1g added Run CICD and removed Run CICD labels Feb 6, 2025
@Ssofja Ssofja force-pushed the asr_models_output_consistency branch from bd17125 to 3987d43 Compare February 7, 2025 12:48
@nithinraok nithinraok enabled auto-merge (squash) February 7, 2025 14:28
nithinraok
nithinraok previously approved these changes Feb 7, 2025
nemo/collections/asr/parts/mixins/transcription.py Outdated Show resolved Hide resolved
@Ssofja Ssofja force-pushed the asr_models_output_consistency branch 4 times, most recently from 5256383 to 4f60b78 Compare February 10, 2025 13:29
Ssofja and others added 12 commits February 10, 2025 17:30
Signed-off-by: Ssofja <[email protected]>
Signed-off-by: Ssofja <[email protected]>
Signed-off-by: Ssofja <[email protected]>
Signed-off-by: Ssofja <[email protected]>
Signed-off-by: Ssofja <[email protected]>
Signed-off-by: Ssofja <[email protected]>
Signed-off-by: Ssofja <[email protected]>
Signed-off-by: Ssofja <[email protected]>
Signed-off-by: Ssofja <[email protected]>
Signed-off-by: Ssofja <[email protected]>
Signed-off-by: Ssofja <[email protected]>
Signed-off-by: Ssofja <[email protected]>
Signed-off-by: Ssofja <[email protected]>
Signed-off-by: Ssofja <[email protected]>
Signed-off-by: Ssofja <[email protected]>
@Ssofja Ssofja force-pushed the asr_models_output_consistency branch from 4f60b78 to a4176a2 Compare February 10, 2025 13:31
@Ssofja Ssofja added Run CICD and removed Run CICD labels Feb 10, 2025
@Ssofja Ssofja force-pushed the asr_models_output_consistency branch from a4176a2 to cd6a13c Compare February 10, 2025 19:22
@Ssofja Ssofja added Run CICD and removed Run CICD labels Feb 10, 2025
Signed-off-by: Ssofja <[email protected]>
Signed-off-by: Ssofja <[email protected]>
@Ssofja Ssofja force-pushed the asr_models_output_consistency branch from cd6a13c to 30b3ec1 Compare February 11, 2025 17:22
@Ssofja Ssofja added Run CICD and removed Run CICD labels Feb 11, 2025
Copy link
Contributor

[🤖]: Hi @Ssofja 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully

So it might be time to merge this PR or get some approvals

I'm just a bot so I'll leave it you what to do next.

//cc @pablo-garay @ko3n1g

Copy link
Collaborator

@nithinraok nithinraok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please remember to update:

  1. huggingface model cards and spaces using main
  2. Add to docs

@nithinraok nithinraok merged commit ee543c2 into main Feb 12, 2025
224 checks passed
@nithinraok nithinraok deleted the asr_models_output_consistency branch February 12, 2025 00:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants