Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to convert Janus models to ONNX #1201

Open
turneram opened this issue Feb 20, 2025 · 0 comments
Open

Unable to convert Janus models to ONNX #1201

turneram opened this issue Feb 20, 2025 · 0 comments
Labels
question Further information is requested

Comments

@turneram
Copy link

Question

I see that @xenova has successfully export Janus-1.3B and Janus-Pro-1B to ONNX, presumably using some version of scripts/convert.py. We are interested in exporting Janus-Pro-7B to ONNX as well, but have not been able to do so using this script (nor any other path). Attempting to convert either of the previous two models encounters the same errors, so hopefully whatever steps were taken to convert those will also enable the 7B version.

The initial error was:

ValueError: The checkpoint you are trying to load has model type `multi_modality` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

This was fixed by installing https://github.com/deepseek-ai/Janus and adding
from janus.models import MultiModalityCausalLM
to convert.py.

The error that I'm now stuck at is:

KeyError: "Unknown task: any-to-any. Possible values are: `audio-classification` for AutoModelForAudioClassification, `audio-frame-classification` for AutoModelForAudioFrameClassification, `audio-xvector` for AutoModelForAudioXVector, `automatic-speech-recognition` for ('AutoModelForSpeechSeq2Seq', 'AutoModelForCTC'), `depth-estimation` for AutoModelForDepthEstimation, `feature-extraction` for AutoModel, `fill-mask` for AutoModelForMaskedLM, `image-classification` for AutoModelForImageClassification, `image-segmentation` for ('AutoModelForImageSegmentation', 'AutoModelForSemanticSegmentation'), `image-to-image` for AutoModelForImageToImage, `image-to-text` for AutoModelForVision2Seq, `mask-generation` for AutoModel, `masked-im` for AutoModelForMaskedImageModeling, `multiple-choice` for AutoModelForMultipleChoice, `object-detection` for AutoModelForObjectDetection, `question-answering` for AutoModelForQuestionAnswering, `semantic-segmentation` for AutoModelForSemanticSegmentation, `text-to-audio` for ('AutoModelForTextToSpectrogram', 'AutoModelForTextToWaveform'), `text-generation` for AutoModelForCausalLM, `text2text-generation` for AutoModelForSeq2SeqLM, `text-classification` for AutoModelForSequenceClassification, `token-classification` for AutoModelForTokenClassification, `zero-shot-image-classification` for AutoModelForZeroShotImageClassification, `zero-shot-object-detection` for AutoModelForZeroShotObjectDetection"

I can't find anything about optimum supporting this task, so it is unclear to me how @xenova was able to get around this.
Any insight or assistance would be greatly appreciated.

@turneram turneram added the question Further information is requested label Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant