Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MiniCPM-o-2_6数据处理问题 #6793

Closed
1 task done
jinzhuoran opened this issue Feb 2, 2025 · 10 comments · Fixed by #6801
Closed
1 task done

MiniCPM-o-2_6数据处理问题 #6793

jinzhuoran opened this issue Feb 2, 2025 · 10 comments · Fixed by #6801
Assignees
Labels
solved This problem has been already solved

Comments

@jinzhuoran
Copy link

jinzhuoran commented Feb 2, 2025

Reminder

  • I have read the above rules and searched the existing issues.

System Info

当输入单张图片时,https://github.com/hiyouga/LLaMA-Factory/blob/main/src/llamafactory/data/mm_plugin.py#L555 会计算valid_image_nums=3,这可能是由于https://github.com/hiyouga/LLaMA-Factory/blob/main/src/llamafactory/data/mm_plugin.py#L551 中的 input_ids_ == processor.tokenizer.slice_end_id 导致的,一个图片会被分成多个slice

然而这会导致https://github.com/hiyouga/LLaMA-Factory/blob/main/src/llamafactory/data/mm_plugin.py#L519 计算错误,例如在一个batch中靠前的数据会分配多张图片,而靠后的样本则为空图片,可能是因为没有利用https://github.com/hiyouga/LLaMA-Factory/blob/main/src/llamafactory/data/mm_plugin.py#L538 这个参数。

想请教一下如何解决这个问题?

Reproduction

Put your message here.

Others

No response

@jinzhuoran jinzhuoran added bug Something isn't working pending This problem is yet to be addressed labels Feb 2, 2025
@jinzhuoran
Copy link
Author

@hiyouga
Copy link
Owner

hiyouga commented Feb 2, 2025

cc @BUAADreamer

@BUAADreamer
Copy link
Collaborator

参考这个pr改一下看看是不是正常了? @jinzhuoran

#6801

@jinzhuoran
Copy link
Author

感谢回复,问题已经解决

@jinzhuoran
Copy link
Author

另外汇报一个问题,设置setattr(config, "init_audio", True),当输入不包含audio数据的时候,似乎会报错

input_embeddings = input_embeddings + audio_embeddings[0].mean() * 0
IndexError: list index out of range

可能需要对audio输入做一些空输入的操作

@hiyouga
Copy link
Owner

hiyouga commented Feb 4, 2025

@jinzhuoran 贴一下代码行数?

@jinzhuoran
Copy link
Author

@BUAADreamer
Copy link
Collaborator

BUAADreamer commented Feb 4, 2025

更新最新的modeling_minicpmo.py 代码即可

@BUAADreamer BUAADreamer added solved This problem has been already solved and removed bug Something isn't working pending This problem is yet to be addressed labels Feb 4, 2025
@BUAADreamer
Copy link
Collaborator

@jinzhuoran 现在正常了嘛

@jinzhuoran
Copy link
Author

感谢,现在已经正常了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants