Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protect against errors raised when adding a request to the engine #230

Merged
merged 5 commits into from
Mar 15, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 12 additions & 8 deletions serve/mlc_serve/engine/staging_engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,9 @@
ScopedInferenceEngine,
SequenceOutput,
)
from .error import TextGenerationError
from .engine_common import get_new_request_state, prepare_output
from .model_module import ModelModule, TokenizerModule
from ..model.base import get_model_artifact_config
from .staging_engine_worker import (
AddRequestsCommand,
CancelRequestCommand,
Expand Down Expand Up @@ -119,13 +119,17 @@ def add(self, requests: list[Request]):
assert isinstance(req.stopping_criteria.stop_sequences, list)

# If the request violates the tokenization, this returns None, so skip.
state = get_new_request_state(
req,
self.conversation_template,
self.tokenizer,
self.model_artifact_config.vocab_size,
)
new_request_states.append(state)
try:
state = get_new_request_state(
req,
self.conversation_template,
self.tokenizer,
self.model_artifact_config.vocab_size,
)
new_request_states.append(state)
except Exception as e:
LOG.warn("Failed to add a request", request_id=req.request_id)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it be better to just throw the error here, and catch it at https://github.com/octoml/ollm/blob/d29be36231e666f761a2fb08dbf0e4ce758618f4/mlc-serve/mlc_serve/engine/async_connector.py#L153? I think initially we just assumed engine.add cannot fail. Now it might be a good time to revisit this assumption. One caveat of the approach of deferring error reporting to the engine.step is that, streaming API can no longer returns error status code once streaming begins (because of how server-sent event works)

Just sharing some thoughts. No need to block this PR if it's related to production issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that for its simplicity, and it also removes the need for an additional lock. I reworked this PR according to your suggestion, but I'm not sure what to do after catching the error in async_connector.py. The following seems to work, but is this the right way?

try:
    await asyncio.to_thread(self.engine.add, [request])
except TextGenerationError as e:
    raise asyncio.CancelledError(e)

Copy link

@yelite yelite Mar 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think any change is needed in async_connector, but I am not fully sure. The TextGenerationError will just propagate to the http handler and the regular error handling can happen there. Because it's the engine.add that fails, we don't need to call engine.cancel either (as in https://github.com/octoml/ollm/blob/de6378ee6a1391276530e94b7b4374f01792c8ae/mlc-serve/mlc_serve/engine/async_connector.py#L99)

Copy link

@yelite yelite Mar 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One benefit of throwing error in engine.add is that the http handler will be able to respond with failure http status code for streaming requests. Throwing a CancelledError will confuse the handler

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed that not catching the error in async_connector.py still keeps the server from dying.

raise TextGenerationError(str(e))

self.command_queue.put(AddRequestsCommand(request_states=new_request_states))

Expand Down
Loading