Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

乌鲁木齐银行CP僵尸进程问题排查 #193

Closed
xqs1986 opened this issue Jan 6, 2025 · 3 comments
Closed

乌鲁木齐银行CP僵尸进程问题排查 #193

xqs1986 opened this issue Jan 6, 2025 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@xqs1986
Copy link

xqs1986 commented Jan 6, 2025

1、乌鲁木齐银行华为云核心环境(ARM)及授信环境(海光X86)的CP
50086e7f-fa94-4f1f-9b2d-ec71201702e7
均会产生大量僵尸进程,导致客户侧业务POD调度异常引发华为云平台告警。
2、现场node的操作系统全部为银河麒麟。
3、我在个人测试环境中验证过:同版本的CP(0.7.5)和CPM(1.2.2.1),在centos环境下,更新CPM策略、重启业务POD的场景下,都不会产生僵尸进程。

请协助排查原因。

Uploading 50086e7f-fa94-4f1f-9b2d-ec71201702e7.jpg…
Uploading ae40efe2-3f8e-415d-941c-ad221a88f37b.png…

@xqs1986 xqs1986 added the bug Something isn't working label Jan 6, 2025
Copy link

github-actions bot commented Jan 6, 2025

Message that will be displayed on users first issue

@robbietu
Copy link
Contributor

robbietu commented Jan 8, 2025

请查看日志文件,提供相关日志,比如“kill running pktg ....”, “wait pid failed...”

@robbietu robbietu closed this as completed Jan 9, 2025
@robbietu
Copy link
Contributor

robbietu commented Jan 9, 2025

[通过现场日志,目前有两个问题:
0000000000.txt

  1. CPM为什么会在00:59和1:00出发策略更新?
  2. 在收到第一次策略更新后,probedaemon执行shell脚本程序变成了僵尸进程
  3. 在收到第二次策略更新后,执行同样shell脚本,没有发生问题。
  4. 现场同样环境反复手动更新策略,也没有僵尸进程产生。
    以上问题有待进一步调研
    Uploading 1.jpg…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants