MiniCPM-o-2_6数据处理问题 #6793

jinzhuoran · 2025-02-02T10:12:25Z

Reminder

I have read the above rules and searched the existing issues.

System Info

当输入单张图片时，https://github.com/hiyouga/LLaMA-Factory/blob/main/src/llamafactory/data/mm_plugin.py#L555 会计算valid_image_nums=3，这可能是由于https://github.com/hiyouga/LLaMA-Factory/blob/main/src/llamafactory/data/mm_plugin.py#L551 中的 input_ids_ == processor.tokenizer.slice_end_id 导致的，一个图片会被分成多个slice

然而这会导致https://github.com/hiyouga/LLaMA-Factory/blob/main/src/llamafactory/data/mm_plugin.py#L519 计算错误，例如在一个batch中靠前的数据会分配多张图片，而靠后的样本则为空图片，可能是因为没有利用https://github.com/hiyouga/LLaMA-Factory/blob/main/src/llamafactory/data/mm_plugin.py#L538 这个参数。

想请教一下如何解决这个问题？

Reproduction

Put your message here.

Others

No response

The text was updated successfully, but these errors were encountered:

jinzhuoran · 2025-02-02T10:18:34Z

问题感觉是出在 https://github.com/hiyouga/LLaMA-Factory/blob/main/src/llamafactory/data/mm_plugin.py#L492 这个函数

hiyouga · 2025-02-02T10:42:46Z

cc @BUAADreamer

BUAADreamer · 2025-02-03T09:10:08Z

参考这个pr改一下看看是不是正常了？ @jinzhuoran

#6801

jinzhuoran · 2025-02-04T02:22:58Z

感谢回复，问题已经解决

jinzhuoran · 2025-02-04T02:46:36Z

另外汇报一个问题，设置setattr(config, "init_audio", True)，当输入不包含audio数据的时候，似乎会报错

input_embeddings = input_embeddings + audio_embeddings[0].mean() * 0
IndexError: list index out of range

可能需要对audio输入做一些空输入的操作

hiyouga · 2025-02-04T02:47:03Z

@jinzhuoran 贴一下代码行数？

jinzhuoran · 2025-02-04T02:50:13Z

报错的代码在minicpm https://huggingface.co/openbmb/MiniCPM-o-2_6/blob/main/modeling_minicpmo.py#L615

BUAADreamer · 2025-02-04T03:04:32Z

更新最新的modeling_minicpmo.py 代码即可

BUAADreamer · 2025-02-04T07:30:53Z

@jinzhuoran 现在正常了嘛

jinzhuoran · 2025-02-04T09:14:09Z

感谢，现在已经正常了

jinzhuoran added bug Something isn't working pending This problem is yet to be addressed labels Feb 2, 2025

hiyouga assigned BUAADreamer Feb 2, 2025

jinzhuoran closed this as completed Feb 4, 2025

hiyouga mentioned this issue Feb 4, 2025

[mm_plugin] Fix Image Process of MiniCPMV #6801

Merged

2 tasks

jinzhuoran reopened this Feb 4, 2025

BUAADreamer closed this as completed Feb 4, 2025

BUAADreamer added solved This problem has been already solved and removed bug Something isn't working pending This problem is yet to be addressed labels Feb 4, 2025

BUAADreamer mentioned this issue Feb 5, 2025

用自己数据微调了一下模型，但是效果不理想，是否有进一步优化的可能呢？ OpenBMB/MiniCPM-o#813

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MiniCPM-o-2_6数据处理问题 #6793

MiniCPM-o-2_6数据处理问题 #6793

jinzhuoran commented Feb 2, 2025 •

edited

Loading

jinzhuoran commented Feb 2, 2025

hiyouga commented Feb 2, 2025

BUAADreamer commented Feb 3, 2025

jinzhuoran commented Feb 4, 2025

jinzhuoran commented Feb 4, 2025

hiyouga commented Feb 4, 2025

jinzhuoran commented Feb 4, 2025

BUAADreamer commented Feb 4, 2025 •

edited

Loading

BUAADreamer commented Feb 4, 2025

jinzhuoran commented Feb 4, 2025

MiniCPM-o-2_6数据处理问题 #6793

MiniCPM-o-2_6数据处理问题 #6793

Comments

jinzhuoran commented Feb 2, 2025 • edited Loading

Reminder

System Info

Reproduction

Others

jinzhuoran commented Feb 2, 2025

hiyouga commented Feb 2, 2025

BUAADreamer commented Feb 3, 2025

jinzhuoran commented Feb 4, 2025

jinzhuoran commented Feb 4, 2025

hiyouga commented Feb 4, 2025

jinzhuoran commented Feb 4, 2025

BUAADreamer commented Feb 4, 2025 • edited Loading

BUAADreamer commented Feb 4, 2025

jinzhuoran commented Feb 4, 2025

jinzhuoran commented Feb 2, 2025 •

edited

Loading

BUAADreamer commented Feb 4, 2025 •

edited

Loading