Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAPT playbooks - with NeMo 2.0 #12067

Merged
merged 10 commits into from
Feb 10, 2025
Merged

DAPT playbooks - with NeMo 2.0 #12067

merged 10 commits into from
Feb 10, 2025

Conversation

jvamaraju
Copy link
Contributor

Added Custom tokenization + DAPT playbooks

Previous PRs (closed but not merged)

@jvamaraju jvamaraju reopened this Feb 6, 2025
@github-actions github-actions bot removed core Changes to NeMo Core TTS ASR NLP labels Feb 6, 2025
@suiyoubi suiyoubi self-requested a review February 6, 2025 17:45
Copy link
Collaborator

@suiyoubi suiyoubi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed NeMo2 notebook

"id": "c43ef563",
"metadata": {},
"source": [
"# NeMo Tools and Resources\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change this to nvcr.io/nvidia/nemo:25.02

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@suiyoubi, @jvamaraju was running into issues when testing with 25.02, for now we should keep it to the container we've tested it on may be? Once you push the new container with the PR for the llama2 tutorial merged we can update this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What issue were you encountered ? I am fine with leaving it as whatever container you have tested for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, will leave it as is for now. @jvamaraju can you share details about the issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Yes lets leave it to 24.12 for now. When we add llama27b recipes...nemo fails to recompile using 25.02 container.

"#### Benefits of bin/idx format for Pretraining:\n",
"\n",
"* **Optimized I/O Performance:** The binary format enables quicker data reads and reduces latency, allowing the model to continuously access data at high speeds.\n",
"* **Efficient Memory Usage:** Data in bin/idx format consumes less memory during loading, making it suitable for large datasets and enabling better use of available system resources.\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are modifying these in the later cells, let's remove them here

Copy link
Contributor

@aasthajh aasthajh Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which cell are you referring to?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean you are overwriting the TP settings in the later cell, you don't want to overwrite it twice to confuse the reader

@suiyoubi suiyoubi added NLP and removed audio labels Feb 7, 2025
@github-actions github-actions bot removed the NLP label Feb 7, 2025
Copy link
Collaborator

@suiyoubi suiyoubi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve for now.
We might need to change the container tag once 25.02 out.

@ko3n1g ko3n1g merged commit 7d81505 into NVIDIA:main Feb 10, 2025
34 of 35 checks passed
youngeunkwon0405 pushed a commit to youngeunkwon0405/NeMo that referenced this pull request Feb 10, 2025
* DAPT with NeMo 2.0

* DAPT with NeMo 2.0

* Apply isort and black reformatting

Signed-off-by: jvamaraju <[email protected]>

* Deleting file not needed

* Update README.md

Signed-off-by: jvamaraju <[email protected]>

* Addressing feedback from PR review for DAPT playbook with nemo 2.0

* Addressing feedback for DAPT with nemo 2.0

* Addressing feedback for DAPT with nemo 2.0- local executor

* Add Copyright

---------

Signed-off-by: jvamaraju <[email protected]>
Signed-off-by: jvamaraju <[email protected]>
Co-authored-by: jvamaraju <[email protected]>
Co-authored-by: aastha <[email protected]>
Co-authored-by: Ao Tang <[email protected]>
Signed-off-by: Youngeun Kwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants