Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: The 4c16g query node still experienced OOM issues even after memory protection was set up with a low water level of 0.75 and a high wate level of 0.85. #39866

Open
1 task done
zhuwenxing opened this issue Feb 13, 2025 · 1 comment
Assignees
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@zhuwenxing
Copy link
Contributor

zhuwenxing commented Feb 13, 2025

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:master-20250213-dccba87f-amd64
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

keep OOM

❯ k get pod|grep fts-stable-test-13
fts-stable-test-13-etcd-0                                         1/1     Running            0                163m
fts-stable-test-13-etcd-1                                         1/1     Running            0                163m
fts-stable-test-13-etcd-2                                         1/1     Running            0                163m
fts-stable-test-13-kafka-0                                        2/2     Running            1 (162m ago)     163m
fts-stable-test-13-kafka-1                                        2/2     Running            0                163m
fts-stable-test-13-kafka-2                                        2/2     Running            0                163m
fts-stable-test-13-kafka-exporter-79c8654c6d-xx7gd                1/1     Running            3 (162m ago)     163m
fts-stable-test-13-milvus-datanode-7596f777cd-vchhw               1/1     Running            2 (162m ago)     163m
fts-stable-test-13-milvus-datanode-7596f777cd-vv5nj               1/1     Running            2 (162m ago)     163m
fts-stable-test-13-milvus-indexnode-84c577bd8c-5vn8c              1/1     Running            2 (162m ago)     163m
fts-stable-test-13-milvus-indexnode-84c577bd8c-rqv8m              1/1     Running            2 (162m ago)     163m
fts-stable-test-13-milvus-mixcoord-74f4998554-rjkzg               1/1     Running            2 (162m ago)     163m
fts-stable-test-13-milvus-proxy-7c4dd9bb48-cfsrj                  1/1     Running            2 (162m ago)     163m
fts-stable-test-13-milvus-querynode-7d4ff9657-7v9hd               1/1     Running            2 (162m ago)     163m
fts-stable-test-13-milvus-querynode-7d4ff9657-9fzns               0/1     CrashLoopBackOff   21 (74s ago)     163m
fts-stable-test-13-milvus-querynode-7d4ff9657-flvww               0/1     CrashLoopBackOff   20 (2m43s ago)   163m
fts-stable-test-13-minio-0                                        1/1     Running            0                163m
fts-stable-test-13-minio-1                                        1/1     Running            0                163m
fts-stable-test-13-minio-2                                        1/1     Running            0                163m
fts-stable-test-13-minio-3                                        1/1     Running            0                163m
fts-stable-test-13-zookeeper-0                                    1/1     Running            0                163m
fts-stable-test-13-zookeeper-1                                    1/1     Running            0                163m
fts-stable-test-13-zookeeper-2                                    1/1     Running            0                163m

[2025-02-13T08:32:49.209Z] extraConfigFiles:

[2025-02-13T08:32:49.209Z]   user.yaml: |+

[2025-02-13T08:32:49.209Z]     dataCoord:

[2025-02-13T08:32:49.209Z]       compaction:

[2025-02-13T08:32:49.209Z]         indexBasedCompaction: false

[2025-02-13T08:32:49.209Z]     indexCoord:

[2025-02-13T08:32:49.209Z]       scheduler:

[2025-02-13T08:32:49.209Z]         interval: 100

[2025-02-13T08:32:49.209Z]     queryNode:

[2025-02-13T08:32:49.209Z]         mmap:

[2025-02-13T08:32:49.209Z]           vectorField: true

[2025-02-13T08:32:49.209Z]           vectorIndex: true

[2025-02-13T08:32:49.209Z]           scalarField: true

[2025-02-13T08:32:49.209Z]           scalarIndex: true        

[2025-02-13T08:32:49.209Z]     quotaAndLimits:

[2025-02-13T08:32:49.209Z]       limitWriting:

[2025-02-13T08:32:49.209Z]         memProtection:

[2025-02-13T08:32:49.209Z]           dataNodeMemoryLowWaterLevel: 0.75

[2025-02-13T08:32:49.209Z]           dataNodeMemoryHighWaterLevel: 0.85

[2025-02-13T08:32:49.209Z]           queryNodeMemoryLowWaterLevel: 0.75

[2025-02-13T08:32:49.209Z]           queryNodeMemoryHighWaterLevel: 0.85

[2025-02-13T08:32:49.209Z]     trace:

[2025-02-13T08:32:49.209Z]       exporter: jaeger

[2025-02-13T08:32:49.209Z]       sampleFraction: 1

[2025-02-13T08:32:49.209Z]       jaeger:

[2025-02-13T08:32:49.214Z]         url: http://tempo-distributor.tempo:14268/api/traces

Image

Expected Behavior

not OOM

Steps To Reproduce

Milvus Log

failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/full%20text%20search%20stable%20test/detail/full%20text%20search%20stable%20test/13/pipeline
log:

artifacts-fts-stable-test-13-server-logs.tar.gz

cluster: 4am
ns: chaos-testing
pod info

[2025-02-13T11:18:03.077Z] + kubectl get pods -o wide

[2025-02-13T11:18:03.083Z] + grep fts-stable-test-13

[2025-02-13T11:18:03.342Z] fts-stable-test-13-etcd-0                                         1/1     Running            0                164m    10.104.15.118   4am-node20   <none>           <none>

[2025-02-13T11:18:03.342Z] fts-stable-test-13-etcd-1                                         1/1     Running            0                164m    10.104.24.106   4am-node29   <none>           <none>

[2025-02-13T11:18:03.342Z] fts-stable-test-13-etcd-2                                         1/1     Running            0                164m    10.104.26.95    4am-node32   <none>           <none>

[2025-02-13T11:18:03.342Z] fts-stable-test-13-kafka-0                                        2/2     Running            1 (164m ago)     164m    10.104.26.90    4am-node32   <none>           <none>

[2025-02-13T11:18:03.342Z] fts-stable-test-13-kafka-1                                        2/2     Running            0                164m    10.104.24.107   4am-node29   <none>           <none>

[2025-02-13T11:18:03.342Z] fts-stable-test-13-kafka-2                                        2/2     Running            0                164m    10.104.16.5     4am-node21   <none>           <none>

[2025-02-13T11:18:03.342Z] fts-stable-test-13-kafka-exporter-79c8654c6d-xx7gd                1/1     Running            3 (164m ago)     164m    10.104.23.236   4am-node27   <none>           <none>

[2025-02-13T11:18:03.342Z] fts-stable-test-13-milvus-datanode-7596f777cd-vchhw               1/1     Running            2 (164m ago)     164m    10.104.21.9     4am-node24   <none>           <none>

[2025-02-13T11:18:03.342Z] fts-stable-test-13-milvus-datanode-7596f777cd-vv5nj               1/1     Running            2 (164m ago)     164m    10.104.30.218   4am-node38   <none>           <none>

[2025-02-13T11:18:03.342Z] fts-stable-test-13-milvus-indexnode-84c577bd8c-5vn8c              1/1     Running            2 (164m ago)     164m    10.104.23.238   4am-node27   <none>           <none>

[2025-02-13T11:18:03.342Z] fts-stable-test-13-milvus-indexnode-84c577bd8c-rqv8m              1/1     Running            2 (164m ago)     164m    10.104.25.224   4am-node30   <none>           <none>

[2025-02-13T11:18:03.342Z] fts-stable-test-13-milvus-mixcoord-74f4998554-rjkzg               1/1     Running            2 (164m ago)     164m    10.104.30.219   4am-node38   <none>           <none>

[2025-02-13T11:18:03.342Z] fts-stable-test-13-milvus-proxy-7c4dd9bb48-cfsrj                  1/1     Running            2 (164m ago)     164m    10.104.23.235   4am-node27   <none>           <none>

[2025-02-13T11:18:03.342Z] fts-stable-test-13-milvus-querynode-7d4ff9657-7v9hd               1/1     Running            2 (164m ago)     164m    10.104.23.237   4am-node27   <none>           <none>

[2025-02-13T11:18:03.342Z] fts-stable-test-13-milvus-querynode-7d4ff9657-9fzns               0/1     CrashLoopBackOff   21 (3m ago)      164m    10.104.30.220   4am-node38   <none>           <none>

[2025-02-13T11:18:03.342Z] fts-stable-test-13-milvus-querynode-7d4ff9657-flvww               0/1     CrashLoopBackOff   20 (4m29s ago)   164m    10.104.32.80    4am-node39   <none>           <none>

[2025-02-13T11:18:03.342Z] fts-stable-test-13-minio-0                                        1/1     Running            0                164m    10.104.15.119   4am-node20   <none>           <none>

[2025-02-13T11:18:03.342Z] fts-stable-test-13-minio-1                                        1/1     Running            0                164m    10.104.24.105   4am-node29   <none>           <none>

[2025-02-13T11:18:03.342Z] fts-stable-test-13-minio-2                                        1/1     Running            0                164m    10.104.26.94    4am-node32   <none>           <none>

[2025-02-13T11:18:03.342Z] fts-stable-test-13-minio-3                                        1/1     Running            0                164m    10.104.17.29    4am-node23   <none>           <none>

[2025-02-13T11:18:03.342Z] fts-stable-test-13-zookeeper-0                                    1/1     Running            0                164m    10.104.15.117   4am-node20   <none>           <none>

[2025-02-13T11:18:03.342Z] fts-stable-test-13-zookeeper-1                                    1/1     Running            0                164m    10.104.26.93    4am-node32   <none>           <none>

[2025-02-13T11:18:03.342Z] fts-stable-test-13-zookeeper-2                                    1/1     Running            0                164m    10.104.24.109   4am-node29   <none>           <none>

Anything else?

No response

@zhuwenxing zhuwenxing added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 13, 2025
@zhuwenxing
Copy link
Contributor Author

/assign @bigsheeper
PTAL

@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 15, 2025
@yanliang567 yanliang567 added this to the 2.6.0 milestone Feb 15, 2025
@yanliang567 yanliang567 removed their assignment Feb 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

3 participants