-
-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
V6: high sustained CPU and memory load from pihole-FTL #2194
Comments
Do you only see high load when accessing the web interface or also during normal operation? |
I was just gonna create an issue about this as well. I'm having the same issue on 3 different systems. All running pihole on docker. One arm64 two amd64 hosts. Memory usage seems to be close to v5 on all of them. CPU usage though increases quite a bit while using the web interface. Anything live updating (like recent queries and total queries on the home page) also spikes the CPU a lot more than v5. |
No difference between the two. I’m seeing CPU spikes as high as 69%. This is a secondary, so very little traffic. |
Also seeing CPU usage as high as 60% during normal operation.
Debug logs: https://tricorder.pi-hole.net/Hx4uwChN/ |
Just adding up to the conversation, some graphs with container's disk/CPU/mem usage as of the upgrade, reported by the Proxmox host (using Unbound as recursive DNS resolver, not sure if it's relevant to understand the issue though...). Network usage is pretty much the same as v5. Also noticing some random stutter on the DNS resolution, maybe related to the huge spike on resource usage (when it happens takes about 5-10 seconds to finish resolution, in comparison of the nearly instantaneous response, as expected for a LAN install). Happy to provide more info as requested by you devs. Thanks in advance. |
A somewhat larger need in RAM is expected for v6.0. It mainly comes from an additional in-memory database we need to use so we can offer the new server-side Query Log feature with reasonable responsiveness. It just needs a few extra B-trees to live in memory to be fast. I am pretty unsure where the disk-reads are coming from. Could you please use something like |
This is htop with thread names, not seeing sustained spikes at the moment, but quite a lot of CPU time has been used on the thread Need a reboot for query IO % for the container, because of a kernel config flag I need to enable on the host ( |
One observation from today: CPU consumption was low, until I accessed the dashboard via browser, then CPU increased dramatically. here is htop output with dashboard being accessed:
Here is http with Dashboard closed:
|
Here is what you actually asked for, but CPU usage is now low:
|
Could you please run
It will probably need some time (and may require
Does this solve your slowness issues? |
I'm not seeing high CPU utilization, nor issues with the web interface. Would it make sense to run this after doing the 6.x update, or is something similar going on in the background, and perhaps the source of the high CPU? |
Here is another highish CPU snapshot:
|
I ended up here through Reddit, where a bunch more people are reporting the same issue - https://www.reddit.com/r/pihole/comments/1iss62l/pihole_v6_extremely_slow_gui_high_cpu_usage/ I really wish major upgrades via pihole -up would at least prompt to continue. Some of us had custom nginx/fpm/doh/dot/vpn setups that got completely hosed. Same issue here ever since upgrading. As soon as I point clients to it, requests start lagging, intermittent outages, and it pegs a single CPU. All 10.255.255 IP's in the recording are local to pihole, and it spikes on each request. Screen.Recording.2025-02-20.at.6.00.58.PM.movIf I move /etc/pihole/pihole-FTL.db, then it doesn't get recreated. Here's my debug data https://tricorder.pi-hole.net/hD9JvUof/ |
Why not? Do you have any logs for us?
No, moving the corrupted database away is what is currently being suggested. |
This issue has been mentioned on Pi-hole Userspace. There might be relevant details there: https://discourse.pi-hole.net/t/web-interface-slow-after-update-from-5-to-6/76280/17 |
@DL6ER did you already rule out that it can be the database size? Is there a quick way to fill a query db to 1.5 GiB with Pi-hole v6? I'm happy to test a bit on my RPi Zero W. |
I tested this myself blowing up the database with identical copies of the the same query (only the ID incrementing) to almost 20 GB without any issues. Only when at least one of the indexes is corrupted, we are getting these issues as the database has no other way than reading the entire table and performing a manual search. This seems to be what is happening in virtually all comparable cases I have seen so far. I remains totally unclear to my why the upgrade of |
Can you give me a script to blow up a v5 query db? 1.5 GiB is sufficient I guess, so I do not need to raise my test VM's disk size 😅. Maybe I can replicate it when upgrading to v6.
Yeah I thought that maybe the library used by the web UI, or the way the web UI does the query call might somehow have a different result, compared to the CLI call. But I missed that you already found an actual corruption to be the cause. |
https://github.com/pi-hole/FTL/blob/v5.25.2/test/pihole-FTL.db.sql should generate a minimal v5 database. |
Idk, it would just say the db file was missing is all.
I tried again, this time it recreated, but the file was barely 400MB to begin with and it passed integrity checks:
I'll point some more clients at it now and see if it still has outages/sluggish perf |
I'm still experiencing the same sluggish performance once it reaches around 5-30 q/s. I tried checking the ui but it's unbearable when dns queries aren't being served either. It's running on a dual core amd epyc (3ghz) kvm that isn't showing performance issues otherwise. Looking at htop I did catch a short lived zstate thread that keeps coming up every 60s or so, but nothing else stands out aside from the constant 99-100% cpu usage stracing the process I see ton of/nonstop pread's The pihole-FTL.db that got created automatically has the following indexes
The rest of the strace seems to suggest it's bogging down going through gravity db for non-cached lookups (I'm blocking ~14m domains). There's a bunch of checks for missing -wal/-journal's
And my gravity.db is pretty big
pihole -g -c -d seems to end with an error:
I added a 4GB swap file, ran pihole -g -c -d again, it didn't even touch the swap/run low enough on ram to need swap and finished the tree building process. After that though, the gravity.db is 1.5GB meanwhile it was 750MB prior to the run with the same lists/number of domains being blocked:
|
Adding the unused swap fixed the issue for me, the culprit was the gravity tree build fail. I don't see the reason for having to add swap, when it didn't even get used. It's almost like the process tries to reserve memory and fails, however the system had 1GB free which should have been enough considering it didn't touch the swap that got added. I didn't dig any further into it since I already wasted enough time. These types of major release updates with possible breaking changes should not be pushed out to the masses with a simple 'pihole -up'. I'm sure plenty of us had custom setups running for years which were now affected, and hours were wasted as a result. Please consider a normal production release cycle for major releases aka leave it up to the end user. Think dist upgrades. |
I am sorry for the issues. We had a rather long beta period with a pretty large number of participants so we hoped to have covered many special cases. It appears not... Looking at your |
Are you sure this all comes from |
All containers stopped only Pihole running. |
Thank you for all the hard work put into this project! Any thoughts on the tree build fail without swap? This also seems to be a commonly reported issue now, and it was the direct reason for my cpu pegged FTL |
I can only assume that it is used for a very brief moment. Maybe it is a bug in |
I also could not reproduce it, but I was also not able to create a large query database yet. Is there an easy way to blow it up quickly to 1.5 GiB? I tried running |
I only seemed to run into the pegged cpu issue when gravity didn't finish building its tree, and there were 5-30 concurrent queries happening for "new records". Note I had ~14m domains in gravity. If you can skip the tree build then run something like |
What is the solution to this now? I have restored v5 from Backup, the v6 is unusable due to high CPU load.... |
If CPU load is again high after the update: sudo systemctl stop pihole-FTL
sudo mv /etc/pihole/pihole-FTL.db /etc/pihole/pihole-FTL.db.bak
sudo systemctl start pihole-FTL You can also remove the old database, if the old query logs are not important for you: sudo rm /etc/pihole/pihole-FTL.db.bak |
Running piholev6 (fresh install) on a raspi zero. Had no issues with running previous pihole versions on this pi, have done it since at least 2020. according to
Immediately, the CPU shot back up to 95-99% and continues to stay there. Edit: I realized that my 8GB SD card could've been problematic here, so I installed a fresh instance of dietpi on a brand new 128GB SD card. Installed pihole fresh. Restored from backup via teleporter, then ran And of course, CPU usage % remains pinned at >95% due to Edit 2: I attempted to increase the swapfile to 12GB then re-run the sqlite index creation, but got the same memory error:
The maximum swap usage during the index creation attempt was ~2.5GB, nowhere near 12GB (I watched it the whole time via Edit 3: I copied While this fixed my problem for now, I sincerely hope this issue is temporary. It seems that I'd have to re-do these steps after every weekly gravity update which is far from ideal. |
Versions
Platform
Expected behavior
Modest memory and CPU consumption
Actual behavior / bug
High sustained memory and CPU consumption by pihole-FTL. Very slow web interface response.
Steps to reproduce
Steps to reproduce the behavior:
Debug Token
https://tricorder.pi-hole.net/KqhHPC9x/
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
updated to 6.x from 5.x via pinhole -up.
The text was updated successfully, but these errors were encountered: