Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate scopes #2671

Merged
merged 25 commits into from
Aug 28, 2024
Merged

Deprecate scopes #2671

merged 25 commits into from
Aug 28, 2024

Conversation

pingsutw
Copy link
Member

@pingsutw pingsutw commented Aug 9, 2024

Tracking issue

NA

Why are the changes needed?

Error messages are not clear in both remote and local execution.

What changes were proposed in this pull request?

  • Add FlyteUserRuntimeException. it is thrown when a user code raises an exception.
  • Non-FlyteUserRuntimeException errros are system errors
  • remove system_entry_point and user_entry_point in the code
  • deprecate FlyteScopedException

How was this patch tested?

unit test / remote

Setup process

from flytekit import task, workflow, ImageSpec

new_flytekit = "git+https://github.com/flyteorg/flytekit@14ab38ca17d3b9a5bb70e5ae9a61955b01f573e4"

sd_finetuning_image = ImageSpec(
    registry="pingsutw",
    apt_packages=["git"],
    packages=[
        new_flytekit,
    ],
    builder="envd",
)


@task(enable_deck=True, container_image=sd_finetuning_image)
def t1(a: int) -> int:
    raise ValueError("error")
    return a + 1


@workflow
def wf() -> int:
    return t1(a=3)

Screenshots

Before:

  • User error
Screenshot 2024-08-10 at 12 35 01 AM
  • System error
Screenshot 2024-08-10 at 2 50 54 AM
  • Local error
Screenshot 2024-08-10 at 2 33 12 AM
  • User error
Screenshot 2024-08-10 at 2 14 31 AM
  • System error
Screenshot 2024-08-10 at 2 47 06 AM
  • Local error
Screenshot 2024-08-10 at 2 33 01 AM
  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

NA

Docs link

NA

Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Copy link

codecov bot commented Aug 10, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.00%. Comparing base (2452c74) to head (f253efe).
Report is 2 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2671      +/-   ##
==========================================
+ Coverage   83.55%   93.00%   +9.45%     
==========================================
  Files           3       26      +23     
  Lines         152     1230    +1078     
==========================================
+ Hits          127     1144    +1017     
- Misses         25       86      +61     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
@@ -272,7 +272,8 @@ async def _do(
agent.do, task_template=template, inputs=literal_map, output_prefix=output_prefix
)
except Exception as e:
raise FlyteUserException(f"Failed to run the task {self.name} with error: {e}") from None
e.args = (f"Failed to run the task {self.name} with error: {e.args[0]}",)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will directly modify the error message instead of raising another exception.

Signed-off-by: Kevin Su <[email protected]>
@pingsutw pingsutw changed the title [WIP] Deprecate scopes Deprecate scopes Aug 11, 2024
@pingsutw pingsutw self-assigned this Aug 11, 2024
@pingsutw pingsutw marked this pull request as ready for review August 11, 2024 05:03
@kumare3
Copy link
Contributor

kumare3 commented Aug 11, 2024

😱 this is awesome

flytekit/exceptions/user.py Outdated Show resolved Hide resolved
@@ -15,6 +16,7 @@

class FlyteScopedException(Exception):
def __init__(self, context, exc_type, exc_value, exc_tb, top_trim=0, bottom_trim=0, kind=None):
warn(f"{self.__class__.__name__} is deprecated.", DeprecationWarning, stacklevel=2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When do we remove deprecated items? flytekit 2.0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't want to wait until 2.0, maybe 6 months after

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you create a github issue to track this deprecation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flytekit/bin/entrypoint.py Outdated Show resolved Hide resolved
Comment on lines 188 to 189
tb_str = "\n ".join([""] + lines)
format_str = "Traceback (most recent call last):\n" "{traceback}\n" "\n" "Message:\n" "\n" " {message}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding a newline in the beginning and joining, which creates another list. Does this work?

Suggested change
tb_str = "\n ".join([""] + lines)
format_str = "Traceback (most recent call last):\n" "{traceback}\n" "\n" "Message:\n" "\n" " {message}"
tb_str = "\n ".join(lines)
format_str = "Traceback (most recent call last):\n" "\n {traceback}\n" "\n" "Message:\n" "\n" " {message}"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated it. new error

Screenshot 2024-08-12 at 2 08 48 PM

Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
wild-endeavor
wild-endeavor previously approved these changes Aug 13, 2024
Copy link
Contributor

@wild-endeavor wild-endeavor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

# If the task is being executed locally, we want to raise the original exception
e.args = (f"Error encountered while executing '{self.name}':\n {e.args[0]}",)
raise
raise FlyteUserRuntimeException(e) from e
Copy link
Contributor

@wild-endeavor wild-endeavor Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a bit odd? to raise a new type using e but also from e. Is this what you're supposed to do? I guess the printing strips out the top layer and only looks at the __cause__ right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. We should have a more specific error message there. Something like "flytekit runtime error. Original error message: {e}".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it was a common use case.

Screenshot 2024-08-17 at 10 35 29 PM

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this, so I can get the user stack trace from e.__cause__.__traceback.
Otherwise, I have to save it in the FlyteUserRuntimeException. like

raise FlyteUserRuntimeException(*_exc_info())

@@ -185,6 +180,17 @@ def _dispatch_execute(
exit(1)


def get_traceback_str(e: Exception) -> str:
tb = e.__cause__.__traceback__ if e.__cause__ else e.__traceback__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we make this recursive?

Are you happy with the way these look? should we try to format them better? This is only used in actual runs right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added a comment here. we should only use e.__cause__.__traceback__ when the error is FlyteUserRuntimeException

Copy link
Collaborator

@eapolinario eapolinario left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty cool!

Worth calling out that the exception handling code is no longer re-entrant. For example, an exception coming from user code called by system code will no longer be considered a system exception, since now we'll only consider system errors any exception that's not a FlyteUserRuntimeException.

flytekit/bin/entrypoint.py Outdated Show resolved Hide resolved
flytekit/core/base_task.py Show resolved Hide resolved
# If the task is being executed locally, we want to raise the original exception
e.args = (f"Error encountered while executing '{self.name}':\n {e.args[0]}",)
raise
raise FlyteUserRuntimeException(e) from e
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. We should have a more specific error message there. Something like "flytekit runtime error. Original error message: {e}".

flytekit/exceptions/scopes.py Show resolved Hide resolved
Comment on lines 16 to 17
FlyteUserRuntimeException is thrown when a user code raises an exception.
@param exc_value: The exception that was raised from user code.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you fix the docstring to follow the rest of the codebase? For example

See the documentation for :py:class:`flytekit.reference_task` and :py:class:`flytekit.reference_workflow` as well.
This function is the general form of the two aforementioned functions. It's better for programmatic usage, as
the interface is passed in as arguments instead of analyzed from type annotations.
.. literalinclude:: ../../../tests/flytekit/unit/core/test_references.py
:start-after: # docs_ref_start
:end-before: # docs_ref_end
:language: python
:dedent: 4
:param resource_type: This is the type of entity it is. Must be one of
:py:class:`flytekit.models.core.identifier.ResourceType`
:param project: The project the entity you're looking for has been registered in.
:param domain: The domain the entity you're looking for has been registered in.
:param name: The name of the registered entity
:param version: The version the entity you're looking for has been registered with.
:param inputs: An ordered dictionary of input names as strings to their Python types.
:param outputs: An ordered dictionary of output names as strings to their Python types.
:return:
.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated it, thanks

@@ -122,48 +120,45 @@ def _dispatch_execute(
)

# Handle user-scoped errors
except _scoped_exceptions.FlyteScopedUserException as e:
except FlyteUserRuntimeException as e:
# Step3b
if isinstance(e.value, IgnoreOutputs):
logger.warning(f"User-scoped IgnoreOutputs received! Outputs.pb will not be uploaded. reason {e}!!")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a User-scoped error. What should we change?

flytekit/bin/entrypoint.py Show resolved Hide resolved
exc_str,
_error_models.ContainerError.Kind.RECOVERABLE,
_execution_models.ExecutionError.ErrorKind.SYSTEM,
)
)
logger.error(f"Exception when executing task {task_def.name or task_def.id.name}, reason {str(e)}")
logger.error("!! Begin Unknown System Error Captured by Flyte !!")
logger.error(f"Exception when executing task {task_def.name}, reason {str(e)}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is task_def.id.name being dropped?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

task_def doesn't have id attribute

Screenshot 2024-08-17 at 10 19 51 PM

flytekit/bin/entrypoint.py Show resolved Hide resolved
@@ -15,6 +16,7 @@

class FlyteScopedException(Exception):
def __init__(self, context, exc_type, exc_value, exc_tb, top_trim=0, bottom_trim=0, kind=None):
warn(f"{self.__class__.__name__} is deprecated.", DeprecationWarning, stacklevel=2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you create a github issue to track this deprecation?

Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
This reverts commit 3c031c7.
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
@pingsutw pingsutw merged commit cf2f492 into master Aug 28, 2024
101 of 103 checks passed
msg = f"Failed to convert inputs of task '{self.name}':\n {exc}"
logger.error(msg)
raise TypeError(msg) from None
exc.args = (f"Failed to convert inputs of task '{self.name}':\n {exc.args[0]}",)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually starts breaking behavior for things like AssertionError where someone might just do assert val > 0 without specifying a message like assert val > 0, "val must be greater than 0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean flytekit raises an exception when using AssertionError in the user code because there is no message in the args?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, we get IndexError: tuple index out of range when a test case throws an empty AssertionError.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be more specific, I mentioned test case only because that's how I caught this bug when migrating from 1.12.x to 1.13.x. The task is what is actually doing the assert val > 0 check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blaketastic2 do you want to open a PR to fix it? otherwise, I'll open one later this week.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take care of it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you in advance!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems pretty straight forward - #2831

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants