Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add shared_memory to task with extended resources #3096

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

thomasjpfan
Copy link
Member

@thomasjpfan thomasjpfan commented Jan 25, 2025

Tracking issue

Towards flyteorg/flyte#6142
Requires flyteorg/flyte#6193

Why are the changes needed?

This PR adds shared memory as an extend resource, that is made available through @task(shared_memory). For the simple case, you can have @task(shared_memory=True), which means: "memory backed volumes are sized to node allocatable memory". Otherwise, you can set shared_memory="2Gi" to specify the value.

What changes were proposed in this pull request?

This PR adds shared_memory to the user facing API and pushes the extended resources into the IDL.

How was this patch tested?

Unit tests were added to this PR and tested with flytekit changes:

import os
from flytekit import task, ImageSpec

image = ImageSpec(
    name="flytekit",
    apt_packages=["git"],
    registry="localhost:30000",
    commands=[
        "uv pip install git+https://github.com/thomasjpfan/flyte.git@65dda339b0088d9e568877577fa78fc88b223582#subdirectory=flytekit"
        "uv pip install git+https://github.com/thomasjpfan/flyte.git@d2c76ff330077875f7826c278f660add7f2c50a9#subdirectory=flyteidl"
    ],
)


@task(container_image=image)
def check_shm2() -> bool:
    return os.path.exists("/dev/shm")

Summary by Bito

This PR implements shared memory support for Flyte tasks, enabling specification of memory requirements via @task decorator. The implementation modifies the shared_memory parameter type to accept both boolean and string values, allowing users to either allocate node memory using boolean flags or specify exact memory sizes (e.g., '2Gi'). The changes include resource construction utilities and task configuration handling modifications for more precise memory allocation control.

Unit tests added: True

Estimated effort to review (1-5, lower is better): 3

@flyte-bot
Copy link
Contributor

flyte-bot commented Jan 25, 2025

Code Review Agent Run #3d3e41

Actionable Suggestions - 4
  • tests/flytekit/unit/core/test_resources.py - 1
    • Consider consolidating redundant test cases · Line 162-164
  • flytekit/core/constants.py - 1
    • Insecure hardcoded temporary file path · Line 44-44
  • flytekit/core/python_auto_container.py - 1
    • Consider adding shared memory format validation · Line 54-54
  • flytekit/core/task.py - 1
Review Details
  • Files reviewed - 10 · Commit Range: ac61755..aecce00
    • flytekit/core/constants.py
    • flytekit/core/node.py
    • flytekit/core/python_auto_container.py
    • flytekit/core/resources.py
    • flytekit/core/task.py
    • flytekit/tools/script_mode.py
    • tests/flytekit/unit/core/test_array_node_map_task.py
    • tests/flytekit/unit/core/test_node_creation.py
    • tests/flytekit/unit/core/test_resources.py
    • tests/flytekit/unit/models/test_tasks.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

AI Code Review powered by Bito Logo

@thomasjpfan thomasjpfan changed the title Add shared_memory_volume to task Add shared_memory to task with extended resources Jan 25, 2025
@flyte-bot
Copy link
Contributor

flyte-bot commented Jan 25, 2025

Changelist by Bito

This pull request implements the following key changes.

Key Change Files Impacted
New Feature - Shared Memory Support for Tasks

constants.py - Added shared memory mount constants

node.py - Added shared memory support to node configuration

resources.py - Implemented shared memory resource construction

task.py - Added shared memory parameter to task decorator

python_auto_container.py - Extended container task with shared memory support

Testing - Unit Tests for Shared Memory Feature

test_array_node_map_task.py - Added tests for shared memory in map tasks

test_node_creation.py - Added tests for shared memory overrides

test_resources.py - Added tests for shared memory resource construction

test_tasks.py - Updated task model tests with shared memory

Other Improvements - Code Cleanup and Import Updates

script_mode.py - Added commented code for future module handling

resources.py - Updated imports and type hints

Comment on lines +162 to +164
@pytest.mark.parametrize("shared_memory", [None, False])
def test_construct_extended_resources_shared_memory_none(shared_memory):
resources = construct_extended_resources(shared_memory=shared_memory)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider consolidating redundant test cases

Consider consolidating the test cases for None and False into a single test case since they produce the same behavior. Both values result in resources being None.

Code suggestion
Check the AI-generated fix before applying
Suggested change
@pytest.mark.parametrize("shared_memory", [None, False])
def test_construct_extended_resources_shared_memory_none(shared_memory):
resources = construct_extended_resources(shared_memory=shared_memory)
def test_construct_extended_resources_shared_memory_none():
resources = construct_extended_resources(shared_memory=None)

Code Review Run #3d3e41


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged


# Shared memory mount name and path
SHARED_MEMORY_MOUNT_NAME = "flyte-shared-memory"
SHARED_MEMORY_MOUNT_PATH = "/dev/shm"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Insecure hardcoded temporary file path

Consider using a more secure temporary file location instead of hardcoding '/dev/shm'. The shared memory directory could potentially be accessed by other processes on the system. Consider using 'tempfile.gettempdir()' to get a secure temporary directory location.

Code suggestion
Check the AI-generated fix before applying
Suggested change
SHARED_MEMORY_MOUNT_PATH = "/dev/shm"
import tempfile
SHARED_MEMORY_MOUNT_PATH = tempfile.gettempdir()

Code Review Run #3d3e41


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

@@ -51,6 +51,7 @@ def __init__(
pod_template: Optional[PodTemplate] = None,
pod_template_name: Optional[str] = None,
accelerator: Optional[BaseAccelerator] = None,
shared_memory: Optional[Union[Literal[True], str]] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding shared memory format validation

Consider validating the shared_memory parameter when it's a string to ensure it follows memory size format (e.g., '1Gi', '512Mi'). Currently there's no validation for the string format.

Code suggestion
Check the AI-generated fix before applying
Suggested change
shared_memory: Optional[Union[Literal[True], str]] = None,
shared_memory: Optional[Union[Literal[True], str]] = None,
if shared_memory and isinstance(shared_memory, str):
import re
if not re.match(r'^[0-9]+(Ki|Mi|Gi|Ti|Pi|Ei|[KMGTPE]i?)?$', shared_memory):
raise ValueError(
f"Invalid shared memory format: {shared_memory}. "
"Must be a valid memory size (e.g., '1Gi', '512Mi')"
)

Code Review Run #3d3e41


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

@@ -211,6 +213,7 @@ def task(
pod_template_name: Optional[str] = None,
accelerator: Optional[BaseAccelerator] = None,
pickle_untyped: bool = False,
shared_memory: Optional[Union[bool, str]] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding shared memory validation

Consider adding validation for the shared_memory parameter to ensure it is either a boolean or a valid memory size string (e.g. '1Gi', '512Mi'). Currently there is no validation which could lead to runtime errors.

Code suggestion
Check the AI-generated fix before applying
Suggested change
shared_memory: Optional[Union[bool, str]] = None,
shared_memory: Optional[Union[bool, str]] = None,
def validate_shared_memory(val: Optional[Union[bool, str]]) -> None:
if val is not None and not isinstance(val, bool):
if not isinstance(val, str) or not re.match(r'^[0-9]+(Mi|Gi)$', val):
raise ValueError('shared_memory must be a boolean or valid memory size string (e.g. "1Gi", "512Mi")')
if shared_memory is not None:
validate_shared_memory(shared_memory)

Code Review Run #3d3e41


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

@kumare3
Copy link
Contributor

kumare3 commented Jan 26, 2025

Did we make a backend change for this? Can we also add tpu support

@thomasjpfan
Copy link
Member Author

Did we make a backend change for this?

I did the backend change here: flyteorg/flyte#6193

Can we also add tpu support

I expect TPU to be similiar to other accerlators. From GKE's docs, TPUs is another taint: https://cloud.google.com/kubernetes-engine/docs/concepts/tpus#how_tpus_work,

Signed-off-by: Thomas J. Fan <[email protected]>
@flyte-bot
Copy link
Contributor

flyte-bot commented Jan 26, 2025

Code Review Agent Run #da14ac

Actionable Suggestions - 0
Review Details
  • Files reviewed - 1 · Commit Range: aecce00..eadab88
    • flytekit/core/node.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

AI Code Review powered by Bito Logo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants