Add support for phi-3-vision-128k-instruct #36

JosefAlbers · 2024-06-05T10:43:39Z

As per #28

Blaizzy · 2024-06-05T11:35:03Z

Hi Josef,

Thanks, this is a really great contribution! 🚀

Could you add the test cases, run pre-commit and rename the phi3vision.py to the model phi3_v.py?

JosefAlbers · 2024-06-06T03:49:36Z

Oh wow, thanks! Will do right away.

Blaizzy · 2024-06-06T07:48:39Z

Thanks for updating the test cases!

Looking at the model structure, I think we can sanitize the weights to create the following structure:

Model

language_model
vision_model -> vision_embed_tokens

This way we'll have a single structure for all models:

class Model(nn.Module):
    def __init__(self, config: ModelConfig):
        self.model_type = config.model_type
        self.config = config

        self.vision_model = VisionModel(config.vision_config)
        self.language_model = LanguageModel(config.text_config)


    def __call__(
        self,
        inputs: mx.array,
        pixel_values=None,
        image_sizes=None,
        cache=None,
    ):
       h = self.language_model.embed_tokens(inputs)
       p = np.argwhere(inputs < 0).tolist()
       if pixel_values is not None:
            h = self.vision_model(pixel_values, h, image_sizes, p)
       return self.language_model(h)

This will help with testing, debugging and will simplify future contributions.

Example: https://github.com/Blaizzy/mlx-vlm/blob/main/mlx_vlm/models/nanoLlava/nanoLlava.py#L240

JosefAlbers · 2024-06-06T11:22:11Z

Right, I will look into it.

Blaizzy · 2024-06-08T09:12:10Z

Hey @JosefAlbers

Any updates?

JosefAlbers · 2024-06-09T03:32:16Z

@Blaizzy sorry, not yet.

Blaizzy · 2024-06-20T08:44:13Z

Hey @JosefAlbers

Any updates?

JosefAlbers · 2024-06-23T09:46:03Z

Hi @Blaizzy, Thanks again for the feedback and for patiently guiding me through this process. I apologize for the delay in responding.

After further reflection on the proposed model restructuring, however, I believe it might be more practical to defer this change for a future iteration. While I see the value in standardizing the model structure across the repository, the current structure aligns better with the original Phi-3-Vision model implementation, and maintaining this alignment could be beneficial in the short term.

My primary reason is to prioritize getting this model available for users as quickly as possible. There already are many newer and possibly better VLM models than Phi-3-Vision at this point (e.g., Kosmos, Florence, ...), and introducing a structural change now could potentially delay the merge, as it would require additional testing to ensure everything still functions as expected. I'd propose the following approach:

Merge the model in its current state.
Add a note to the issue tracker or project roadmap highlighting the potential for future refactoring to align with other models in the repository.

This would allow us to revisit the structure at a later time when we have more bandwidth for potential adjustments and testing. I'm open to discussing this further and collaborating on the best path forward. Please let me know your thoughts on this approach.

Lastly, I want to thank you again for the great codes you've been writing in this repository. Your work has been an invaluable learning resource for me.

Blaizzy · 2024-06-23T12:11:07Z

Most welcome!

Ayt, I agree with you we can merge it and then update it.

Before that, could you rebase your branch and test it still works as expected.

JosefAlbers · 2024-06-23T15:21:07Z

@Blaizzy, Yeap, got a new branch rebased it up to date, copy pasted files from original to the new rebased branch, ran precommits and tests.🤞🏾

Blaizzy · 2024-06-23T15:40:46Z

Please rebase/sync this branch with the main to fix the conflicts, run the tests and precommit.

pytest .

pre-commit run --all-files

Rebase for phi3v

JosefAlbers · 2024-06-24T02:12:14Z

Sorry for the silly error, I realize I might have not pushed after running the pre-commit on my local machine. I fixed them.

Blaizzy

LGTM!

Just a tiny nit.

mlx_vlm/models/phi3_v/vision.py

Blaizzy · 2024-06-24T16:10:36Z

Thank you very much @JosefAlbers for your hard work! 🎉👏🏾

We got some more models to port here: #39

Blaizzy · 2024-06-24T16:25:15Z

@JosefAlbers what's your twitter handle?

JosefAlbers · 2024-06-25T07:31:01Z

@Blaizzy You're very welcome, I'm just happy to help this awesome project grow. And I can't wait to see these new model ports in action!

Blaizzy · 2024-06-25T07:47:49Z

Thank you very much!

You are an absolute legend 👏🏾

Please feel free to share more suggestions or pick any of the good first issues.

I'm currently working on the trainer 🚀

phi3_v into mlx_vlm

4a4823d

Update test_models.py

942eeff

copy pastes of past files into this new rebase

adda5fb

JosefAlbers added 4 commits June 24, 2024 01:39

precommit

45338b1

Merge branch 'main' into rebase-for-phi3v

cc3451e

Merge pull request #1 from JosefAlbers/rebase-for-phi3v

6ea5292

Rebase for phi3v

precommit

9f39e1f

Blaizzy requested changes Jun 24, 2024

View reviewed changes

mlx_vlm/models/phi3_v/vision.py Outdated Show resolved Hide resolved

remove debug print

20cf225

jjb21 approved these changes Jun 24, 2024 •

edited

Loading

View reviewed changes

Blaizzy added 3 commits June 24, 2024 17:04

add prompt format

20f3143

add condition to fix quantisation

91d38b8

bump version

4c5048f

Blaizzy changed the title ~~Support for phi-3-vision-128k-instruct~~ Add support for phi-3-vision-128k-instruct Jun 24, 2024

Blaizzy approved these changes Jun 24, 2024

View reviewed changes

Blaizzy merged commit 8ca2a55 into Blaizzy:main Jun 24, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for phi-3-vision-128k-instruct #36

Add support for phi-3-vision-128k-instruct #36

JosefAlbers commented Jun 5, 2024

Blaizzy commented Jun 5, 2024 •

edited

Loading

JosefAlbers commented Jun 6, 2024

Blaizzy commented Jun 6, 2024 •

edited

Loading

JosefAlbers commented Jun 6, 2024

Blaizzy commented Jun 8, 2024

JosefAlbers commented Jun 9, 2024

Blaizzy commented Jun 20, 2024

JosefAlbers commented Jun 23, 2024 •

edited

Loading

Blaizzy commented Jun 23, 2024 •

edited

Loading

JosefAlbers commented Jun 23, 2024 •

edited

Loading

Blaizzy commented Jun 23, 2024 •

edited

Loading

JosefAlbers commented Jun 24, 2024

Blaizzy left a comment

Blaizzy commented Jun 24, 2024

Blaizzy commented Jun 24, 2024

JosefAlbers commented Jun 25, 2024

Blaizzy commented Jun 25, 2024

Add support for phi-3-vision-128k-instruct #36

Add support for phi-3-vision-128k-instruct #36

Conversation

JosefAlbers commented Jun 5, 2024

Blaizzy commented Jun 5, 2024 • edited Loading

JosefAlbers commented Jun 6, 2024

Blaizzy commented Jun 6, 2024 • edited Loading

JosefAlbers commented Jun 6, 2024

Blaizzy commented Jun 8, 2024

JosefAlbers commented Jun 9, 2024

Blaizzy commented Jun 20, 2024

JosefAlbers commented Jun 23, 2024 • edited Loading

Blaizzy commented Jun 23, 2024 • edited Loading

JosefAlbers commented Jun 23, 2024 • edited Loading

Blaizzy commented Jun 23, 2024 • edited Loading

JosefAlbers commented Jun 24, 2024

Blaizzy left a comment

Choose a reason for hiding this comment

Blaizzy commented Jun 24, 2024

Blaizzy commented Jun 24, 2024

JosefAlbers commented Jun 25, 2024

Blaizzy commented Jun 25, 2024

Blaizzy commented Jun 5, 2024 •

edited

Loading

Blaizzy commented Jun 6, 2024 •

edited

Loading

JosefAlbers commented Jun 23, 2024 •

edited

Loading

Blaizzy commented Jun 23, 2024 •

edited

Loading

JosefAlbers commented Jun 23, 2024 •

edited

Loading

Blaizzy commented Jun 23, 2024 •

edited

Loading