-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the Flyte agent to provision and manage K8s (data) service for deep learning (GNN) use cases #3004
base: master
Are you sure you want to change the base?
Add the Flyte agent to provision and manage K8s (data) service for deep learning (GNN) use cases #3004
Conversation
983dc2a
to
944a500
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #3004 +/- ##
===========================================
+ Coverage 51.08% 90.46% +39.38%
===========================================
Files 201 100 -101
Lines 21231 4920 -16311
Branches 2731 0 -2731
===========================================
- Hits 10846 4451 -6395
+ Misses 9787 469 -9318
+ Partials 598 0 -598 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is amazing!!! leave some minor comments
plugins/flytekit-k8sdataservice/flytekitplugins/k8sdataservice/agent.py
Outdated
Show resolved
Hide resolved
plugins/flytekit-k8sdataservice/flytekitplugins/k8sdataservice/agent.py
Outdated
Show resolved
Hide resolved
plugins/flytekit-k8sdataservice/k8s_ops/k8s-service-agent-rolebinding.yaml
Outdated
Show resolved
Hide resolved
plugins/flytekit-k8sdataservice/flytekitplugins/k8sdataservice/k8s/manager.py
Outdated
Show resolved
Hide resolved
plugins/flytekit-k8sdataservice/tests/k8sdataservice/test_agent.py
Outdated
Show resolved
Hide resolved
plugins/flytekit-k8sdataservice/tests/k8sdataservice/test_agent.py
Outdated
Show resolved
Hide resolved
737222a
to
ec99598
Compare
…In internal things removed Signed-off-by: Shuying Liang <[email protected]>
b9c4dd1
to
a0c5d8e
Compare
Signed-off-by: Shuying Liang <[email protected]>
a0c5d8e
to
ec6d4c1
Compare
Why are the changes needed?
Graph Neural Networks are critical for understanding complex relationships across LinkedIn's professional networks. However, training these models at scale involves intricate data loading, sampling, and processing across multiple nodes and GPUs. The missing piece is the infrastructure to support how and where to run these Kubernetes data services, making them scalable and reliable along with the training or inference processes.
To simplify the complex orchestration pipeline, we decided to leverage flyte agent framework to provision and manage the data services for GNN use case.
What changes were proposed in this pull request?
This PR adds the flyte agent to create/update/delete the K8s statefulset and service.
How was this patch tested?
MPIJobs
(for deep learning GNN training) orTFJob
(for offline inference)Setup process
pip install flytekitplugins-k8sdataservice
Screenshots
Check all the applicable boxes
Docs link
Blog from Flyte community sync