Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[torch-frontend] add stablehlo IRs for Mixtral model. #254

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

Vremold
Copy link
Collaborator

@Vremold Vremold commented May 16, 2024

In this PR, we provide stablehlo IR of a single Mixtral decoder layer using ByteIR stack. The IR is elided by --mlir-elide-resource-strings-if-larger=1000 option, so not all dialect resources storing the model weights are displayed in the IR.

Note: we have some local patches to make the compilation succeed.

  1. We eliminate torch.runtime.assert in stablehlo conversion, as we haven't decided how to handle it.
  2. We need patches of PR 3322 and PR 3085 in torch-mlir

@Vremold Vremold requested a review from qingyunqu May 16, 2024 17:01
@liwenchangbdbz liwenchangbdbz added the enhancement New feature or request label May 21, 2024
@Vremold
Copy link
Collaborator Author

Vremold commented May 30, 2024

Update at 2024.05.31.

We add stablehlo IR of a whole Mixtral 8x7B model. Note, to save compilation time and memory consumption, we convert the large weights into splat DenseElementsAttrs. See frontends/torch-frontend/examples/inference/mixtral/infer_single_mixtral.py for how to run.

@Vremold Vremold changed the title [torch-frontend] add elided stablehlo IR for a single Mixtral decoder layer [torch-frontend] add stablehlo IRs for Mixtral model. May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants