[Model] Refactoring of MiniCPM-V and add MiniCPM-o-2.6 support for vLLM #12069

HwwwwwwwH · 2025-01-15T07:12:18Z

This PR aims to adapt and support all the features of MiniCPM-V and MiniCPM-o. It is designed to be compatible with various modalities (image, video, audio), different model versions (2.0, 2.5, 2.6, o), and diverse input types (raw, embeddings), while maintaining support for LORA, which might require significant effort.

Below is the roadmap for this PR:

This PR is still in development. Once I complete the support for audio, I will request to merge. I'll get this work done ASAP.

FIX #12162

github-actions · 2025-01-15T07:12:28Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

ywang96 · 2025-01-15T09:19:34Z

Really appreciate your effort planned on this PR!

Support for audio outputs (using hidden states).
Streaming multimodal inputs (may be complex; consider starting a new PR for this feature in the future).

It would be great if you can share some design decisions for the these two items as an RFC (or two separate RFCs) first before we proceed with implementation. We (vLLM team) are also thinking about how we want to support multimodal output and streaming/realtime API on vLLM so it's probably the best time for us to discuss these items!

HwwwwwwwH · 2025-01-15T13:58:07Z

Really appreciate your effort planned on this PR!

Support for audio outputs (using hidden states).
Streaming multimodal inputs (may be complex; consider starting a new PR for this feature in the future).

It would be great if you can share some design decisions for the these two items as an RFC (or two separate RFCs) first before we proceed with implementation. We (vLLM team) are also thinking about how we want to support multimodal output and streaming/realtime API on vLLM so it's probably the best time for us to discuss these items!

Thank you for suggestion! I'll start these two RFCs tomorrow.

HwwwwwwwH · 2025-01-15T14:00:04Z

@DarkLight1337 I think I might need some help for verifying LoRA support. Should I do any changes for it?

DarkLight1337 · 2025-01-15T14:01:25Z

@jeejeelee can help with this. Please keep in mind though that currently LoRA is only supported for the language part of multi-modal models.

Signed-off-by: hzh <[email protected]>

…tended design (vllm-project#11672) Signed-off-by: Sungjae Lee <[email protected]> Signed-off-by: hzh <[email protected]>

…ect#11921) Signed-off-by: shaochangxu.scx <[email protected]> Co-authored-by: shaochangxu.scx <[email protected]> Signed-off-by: hzh <[email protected]>

…ject#11934) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: hzh <[email protected]>

…#11951) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: hzh <[email protected]>

Signed-off-by: NickLucche <[email protected]> Signed-off-by: hzh <[email protected]>

Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Isotr0py <[email protected]> Signed-off-by: hzh <[email protected]>

Signed-off-by: Roger Wang <[email protected]> Signed-off-by: hzh <[email protected]>

Signed-off-by: Rafael Vasquez <[email protected]> Signed-off-by: hzh <[email protected]>

Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: hzh <[email protected]>

…roject#11100) Signed-off-by: Akshat Tripathi <[email protected]> Signed-off-by: Oleg Mosalov <[email protected]> Signed-off-by: Jee Jee Li <[email protected]> Co-authored-by: Oleg Mosalov <[email protected]> Co-authored-by: Jee Jee Li <[email protected]> Co-authored-by: Isotr0py <[email protected]> Signed-off-by: hzh <[email protected]>

Signed-off-by: hzh <[email protected]>

Signed-off-by: [email protected] <[email protected]> Signed-off-by: hzh <[email protected]>

…m-project#9685) Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: hzh <[email protected]>

…project#11973) Signed-off-by: [email protected] <[email protected]> Signed-off-by: hzh <[email protected]>

Signed-off-by: hzh <[email protected]>

…project#11979) Signed-off-by: hzh <[email protected]>

Signed-off-by: Sungjae Lee <[email protected]> Signed-off-by: hzh <[email protected]>

Co-authored-by: Cyrus Leung <[email protected]>

…nicpmv-refactor

DarkLight1337 · 2025-01-23T09:07:18Z

vllm/model_executor/models/minicpmv.py

+        if "image" in result["mm_placeholders"] and \
+            self.info.get_model_version() in [(2, 6), (2, "6O")]:
+            result["mm_placeholders"]["image"] = [
+                PlaceholderRange(offset=p["offset"] + 3 + idx // 10,
+                                 length=p["length"] - 3 - idx // 10)
+                for idx, p in enumerate(result["mm_placeholders"]["image"])
+            ]


You can use PromptReplacementDetails (introduced by #12269) to simplify this code

vllm/model_executor/models/minicpmv.py

vllm/model_executor/models/minicpmo.py

docs/source/models/supported_models.md

Co-authored-by: Cyrus Leung <[email protected]>

docs/source/models/supported_models.md

vllm/model_executor/models/minicpmo.py

DarkLight1337 · 2025-01-24T04:41:43Z

Please take a look at the CI failures from before: https://buildkite.com/vllm/fastcheck/builds/12337#0194961e-b0c4-46a6-8033-93d6ca567c9f

Can you fix the tests locally? Afterwards I'll unblock the tests on CI.

vllm/model_executor/models/minicpmo.py

PancakeAwesome · 2025-01-24T09:30:04Z

When will this feature be available if only non-streaming image input is used？

HwwwwwwwH · 2025-01-24T09:35:57Z

When will this feature be available if only non-streaming image input is used？

For none-streaming image input support of MiniCPM-o-2_6, you can use our own fork and build vllm from source.
And I think this PR will get merged in these two days. Sry for waiting.

Co-authored-by: Cyrus Leung <[email protected]>

zhy844694805 · 2025-01-24T16:56:41Z

I am very eager to test the Minicpm-o2.6 model using the vllm engine, but you haven't merged the PR yet. Now, I would like to know, for a 4090/5090 graphics card using vllm to infer the Minicpm-o2.6 model with audio input and output, what do you estimate the concurrency and latency to be approximately?

zhy844694805 · 2025-01-24T16:59:58Z

如果仅使用非流式图像输入，什么时候可以使用此功能？

对于的非流式图像输入支持MiniCPM-o-2_6，您可以使用我们自己的 fork 并vllm从源代码构建。我认为这个 PR 将在这两天内合并。抱歉让您久等了。

TTS功能会在这两天合并吗？（因为您的路线图上没有勾选TTS功能）如果是，那就太棒了

This was referenced Jan 15, 2025

[RFC]: Multi-modality Support on vLLM #4194

Open

[RFC]: Merge input processor and input mapper for multi-modal models #10114

Open

DarkLight1337 self-assigned this Jan 15, 2025

DarkLight1337 mentioned this pull request Jan 17, 2025

[New Model]: openbmb/MiniCPM-o-2_6 #12162

Open

1 task

HwwwwwwwH mentioned this pull request Jan 18, 2025

[BUG] <title>vllm部署调用报错：TypeError: Unknown image model type: minicpmo OpenBMB/MiniCPM-o#742

Open

2 tasks

linyinli mentioned this pull request Jan 19, 2025

How to install Minicpm-0 with gpustack error, how to install Minicpm-0 gpustack/gpustack#1046

Closed

HwwwwwwwH and others added 19 commits January 22, 2025 14:28

refactor for images

f78ad12

Signed-off-by: hzh <[email protected]>

supprot image embedding for minicpmv

95230b9

Signed-off-by: hzh <[email protected]>

[Bugfix][SpecDecode] Adjust Eagle model architecture to align with in…

42ffb1b

…tended design (vllm-project#11672) Signed-off-by: Sungjae Lee <[email protected]> Signed-off-by: hzh <[email protected]>

[Bugfix] fused_experts_impl wrong compute type for float32 (vllm-proj…

43ff2e9

…ect#11921) Signed-off-by: shaochangxu.scx <[email protected]> Co-authored-by: shaochangxu.scx <[email protected]> Signed-off-by: hzh <[email protected]>

[CI/Build] Move model-specific multi-modal processing tests (vllm-pro…

0ec9974

…ject#11934) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: hzh <[email protected]>

[Doc] Basic guide for writing unit tests for new models (vllm-project…

b4a9094

…#11951) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: hzh <[email protected]>

[Bugfix] Fix RobertaModel loading (vllm-project#11940)

ac29198

Signed-off-by: NickLucche <[email protected]> Signed-off-by: hzh <[email protected]>

[Model] Add cogagent model support vLLM (vllm-project#11742)

286107f

Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Isotr0py <[email protected]> Signed-off-by: hzh <[email protected]>

[V1] Avoid sending text prompt to core engine (vllm-project#11963)

535e120

Signed-off-by: Roger Wang <[email protected]> Signed-off-by: hzh <[email protected]>

[CI/Build] Add markdown linter (vllm-project#11857)

925562b

Signed-off-by: Rafael Vasquez <[email protected]> Signed-off-by: hzh <[email protected]>

[Model] Initialize support for Deepseek-VL2 models (vllm-project#11578)

936b306

Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: hzh <[email protected]>

[Hardware][TPU] workaround fix for MoE on TPU (vllm-project#11764)

eac7811

Signed-off-by: hzh <[email protected]>

[V1][Core][1/n] Logging and Metrics (vllm-project#11962)

e251866

Signed-off-by: [email protected] <[email protected]> Signed-off-by: hzh <[email protected]>

[Model] Support GGUF models newly added in transformers 4.46.0 (vll…

e46c06b

…m-project#9685) Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: hzh <[email protected]>

[V1] [2/n] Logging and Metrics - OutputProcessor Abstraction (vllm-…

d12c0de

…project#11973) Signed-off-by: [email protected] <[email protected]> Signed-off-by: hzh <[email protected]>

[MISC] fix typo in kv transfer send recv test (vllm-project#11983)

e459c90

Signed-off-by: hzh <[email protected]>

[Bug] Fix usage of .transpose() and .view() consecutively. (vllm-…

93a78ba

…project#11979) Signed-off-by: hzh <[email protected]>

[CI][Spec Decode] fix: broken test for EAGLE model (vllm-project#11972)

dd2f627

Signed-off-by: Sungjae Lee <[email protected]> Signed-off-by: hzh <[email protected]>

HwwwwwwwH and others added 4 commits January 23, 2025 17:00

Update docs/source/models/supported_models.md

42e7e78

Co-authored-by: Cyrus Leung <[email protected]>

Update tests/models/decoder_only/vision_language/test_models.py

6c0a686

Co-authored-by: Cyrus Leung <[email protected]>

format

c15228b

Merge branch 'minicpmv-refactor' of github.com:HwwwwwwwH/vllm into mi…

c51026d

…nicpmv-refactor