-
-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for phi-3-vision-128k-instruct #36
Conversation
Hi Josef, Thanks, this is a really great contribution! 🚀 Could you add the test cases, run pre-commit and rename the |
Oh wow, thanks! Will do right away. |
Thanks for updating the test cases! Looking at the model structure, I think we can sanitize the weights to create the following structure: Model
This way we'll have a single structure for all models: class Model(nn.Module):
def __init__(self, config: ModelConfig):
self.model_type = config.model_type
self.config = config
self.vision_model = VisionModel(config.vision_config)
self.language_model = LanguageModel(config.text_config)
def __call__(
self,
inputs: mx.array,
pixel_values=None,
image_sizes=None,
cache=None,
):
h = self.language_model.embed_tokens(inputs)
p = np.argwhere(inputs < 0).tolist()
if pixel_values is not None:
h = self.vision_model(pixel_values, h, image_sizes, p)
return self.language_model(h) This will help with testing, debugging and will simplify future contributions. Example: https://github.com/Blaizzy/mlx-vlm/blob/main/mlx_vlm/models/nanoLlava/nanoLlava.py#L240 |
Right, I will look into it. |
Hey @JosefAlbers Any updates? |
@Blaizzy sorry, not yet. |
Hey @JosefAlbers Any updates? |
Hi @Blaizzy, Thanks again for the feedback and for patiently guiding me through this process. I apologize for the delay in responding. After further reflection on the proposed model restructuring, however, I believe it might be more practical to defer this change for a future iteration. While I see the value in standardizing the model structure across the repository, the current structure aligns better with the original Phi-3-Vision model implementation, and maintaining this alignment could be beneficial in the short term. My primary reason is to prioritize getting this model available for users as quickly as possible. There already are many newer and possibly better VLM models than Phi-3-Vision at this point (e.g., Kosmos, Florence, ...), and introducing a structural change now could potentially delay the merge, as it would require additional testing to ensure everything still functions as expected. I'd propose the following approach:
This would allow us to revisit the structure at a later time when we have more bandwidth for potential adjustments and testing. I'm open to discussing this further and collaborating on the best path forward. Please let me know your thoughts on this approach. Lastly, I want to thank you again for the great codes you've been writing in this repository. Your work has been an invaluable learning resource for me. |
Most welcome! Ayt, I agree with you we can merge it and then update it. Before that, could you rebase your branch and test it still works as expected. |
@Blaizzy, Yeap, got a new branch rebased it up to date, copy pasted files from original to the new rebased branch, ran precommits and tests.🤞🏾 |
Please rebase/sync this branch with the main to fix the conflicts, run the tests and precommit.
|
Sorry for the silly error, I realize I might have not pushed after running the pre-commit on my local machine. I fixed them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Just a tiny nit.
Thank you very much @JosefAlbers for your hard work! 🎉👏🏾 We got some more models to port here: #39 |
@JosefAlbers what's your twitter handle? |
@Blaizzy You're very welcome, I'm just happy to help this awesome project grow. And I can't wait to see these new model ports in action! |
Thank you very much! You are an absolute legend 👏🏾 Please feel free to share more suggestions or pick any of the good first issues. I'm currently working on the trainer 🚀 |
As per #28