Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V6: high sustained CPU and memory load from pihole-FTL #2194

Open
salmundo11 opened this issue Feb 18, 2025 · 33 comments
Open

V6: high sustained CPU and memory load from pihole-FTL #2194

salmundo11 opened this issue Feb 18, 2025 · 33 comments

Comments

@salmundo11
Copy link

Versions

pi@pihole64:~$ sudo pihole -v
Core version is v6.0 (Latest: N/A)
Web version is v6.0 (Latest: v6.0)
FTL version is v6.0 (Latest: v6.0)

Platform

pi@pihole64:~$ uname -a
Linux pihole64 6.6.74+rpt-rpi-v7 pi-hole/pi-hole#1 SMP Raspbian 1:6.6.74-1+rpt1 (2025-01-27) armv7l GNU/Linux
  • Platform: Raspberry Pi Zero W 64 bit

Expected behavior

Modest memory and CPU consumption

Actual behavior / bug

High sustained memory and CPU consumption by pihole-FTL. Very slow web interface response.

Steps to reproduce

Steps to reproduce the behavior:

PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                
  717 pihole    10 -10  170556 159512   3328 S  17.5  36.6   7:00.14 pihole-FTL       

Debug Token

  • URL:

https://tricorder.pi-hole.net/KqhHPC9x/

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

updated to 6.x from 5.x via pinhole -up.

@yubiuser
Copy link
Member

Do you only see high load when accessing the web interface or also during normal operation?

@yubiuser yubiuser transferred this issue from pi-hole/pi-hole Feb 18, 2025
@Tresmos
Copy link

Tresmos commented Feb 18, 2025

I was just gonna create an issue about this as well. I'm having the same issue on 3 different systems. All running pihole on docker. One arm64 two amd64 hosts. Memory usage seems to be close to v5 on all of them. CPU usage though increases quite a bit while using the web interface. Anything live updating (like recent queries and total queries on the home page) also spikes the CPU a lot more than v5. 

@salmundo11
Copy link
Author

Do you only see high load when accessing the web interface or also during normal operation?

No difference between the two. I’m seeing CPU spikes as high as 69%.

This is a secondary, so very little traffic.

@strongville
Copy link

Also seeing CPU usage as high as 60% during normal operation.
On v5 usage was between 5% or 10%, but can't downgrade at the moment to recheck this percentage.

Image

root@pihole:~# uname -a
Linux pihole 6.8.12-5-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-5 (2024-12-03T10:26Z) x86_64 x86_64 x86_64 GNU/Linux

Debug logs: https://tricorder.pi-hole.net/Hx4uwChN/

@strongville
Copy link

Just adding up to the conversation, some graphs with container's disk/CPU/mem usage as of the upgrade, reported by the Proxmox host (using Unbound as recursive DNS resolver, not sure if it's relevant to understand the issue though...). Network usage is pretty much the same as v5.

Image

Image

Image

Also noticing some random stutter on the DNS resolution, maybe related to the huge spike on resource usage (when it happens takes about 5-10 seconds to finish resolution, in comparison of the nearly instantaneous response, as expected for a LAN install).

Happy to provide more info as requested by you devs. Thanks in advance.

@DL6ER
Copy link
Member

DL6ER commented Feb 19, 2025

A somewhat larger need in RAM is expected for v6.0. It mainly comes from an additional in-memory database we need to use so we can offer the new server-side Query Log feature with reasonable responsiveness. It just needs a few extra B-trees to live in memory to be fast. I am pretty unsure where the disk-reads are coming from.

Could you please use something like htop (with custom thread names and tree-view enabled) to check which of the threads is consuming the extra CPU or if it is the main process and something like iotop to get a similar measure for the I/O the threads are causing?

@strongville
Copy link

This is htop with thread names, not seeing sustained spikes at the moment, but quite a lot of CPU time has been used on the thread database (10 hours of 16 the FTL process has been alive, if I'm not mistaken). Usage for this thread escalates to ~15% and drops to zero after that.

Image

Need a reboot for query IO % for the container, because of a kernel config flag I need to enable on the host (CONFIG_TASK_DELAY_ACCT) but at the moment can't do that. At the moment seeing appear FTL sparingly, like once every ~5 seconds, but nothing sustained.

@salmundo11
Copy link
Author

One observation from today: CPU consumption was low, until I accessed the dashboard via browser, then CPU increased dramatically.

here is htop output with dashboard being accessed:

  0[**                                                           2.7%] Tasks: 29, 19 thr, 107 kthr; 2 running
    1[*                                                            0.7%] Load average: 0.38 0.28 0.16 
    2[#############***************                                41.6%] Uptime: 21:32:26
    3[###*                                                         5.9%]
  Mem[|||||||||||||||||||||#*@@@@@@@@@@@@@@@@@@@@@@@@@@@@     133M/426M]
  Swp[|||||##                                                13.8M/200M]

  [Main] [I/O]
  PID USER       PRI  NI  VIRT   RES   SHR S  CPU%▽MEM%   TIME+  Command                                                                          
  717 pihole      10 -10 96696 87860  5120 S  44.8 20.2  1h52:04 /usr/bin/pihole-FTL -f                                                           
  848 pihole      10 -10 96696 87860  5120 D  40.2 20.2  1h50:36 /usr/bin/pihole-FTL -f
  855 pihole      10 -10 96696 87860  5120 S   4.6 20.2  0:02.91 /usr/bin/pihole-FTL -f
27896 pi          20   0  8380  3456  2816 R   2.0  0.8  0:00.22 htop
  849 pihole      10 -10 96696 87860  5120 S   0.7 20.2  0:13.83 /usr/bin/pihole-FTL -f

Here is http with Dashboard closed:

 0[                                                             0.0%] Tasks: 29, 19 thr, 109 kthr; 1 running
    1[                                                             0.0%] Load average: 0.16 0.23 0.17 
    2[                                                             0.0%] Uptime: 21:35:39
    3[#*                                                           2.0%]
  Mem[|||||||||||||||||||||#*@@@@@@@@@@@@@@@@@@@@@@@@@@@@     135M/426M]
  Swp[|||||##                                                13.8M/200M]

  [Main] [I/O]
  PID USER       PRI  NI  VIRT   RES   SHR S  CPU%▽MEM%   TIME+  Command                                                                          
28032 pi          20   0  8380  3456  2816 R   1.3  0.8  0:00.14 htop                                                                             
  717 pihole      10 -10 96688 87696  5120 S   0.7 20.1  1h52:20 /usr/bin/pihole-FTL -f
  848 pihole      10 -10 96688 87696  5120 S   0.7 20.1  1h50:39 /usr/bin/pihole-FTL -f

@salmundo11
Copy link
Author

Here is what you actually asked for, but CPU usage is now low:

  0[#*                                                           1.3%] Tasks: 29, 19 thr, 108 kthr; 1 running
    1[#**                                                          3.3%] Load average: 0.12 0.17 0.17 
    2[                                                             0.0%] Uptime: 21:48:02
    3[                                                             0.0%]
  Mem[|||||||||||||||||||||#*@@@@@@@@@@@@@@@@@@@@@@@@@@@      135M/426M]
  Swp[|||||##                                                14.7M/200M]

  [Main] [I/O]
  PID△USER       PRI  NI  VIRT   RES   SHR S  CPU% MEM%   TIME+  Command                                                                          
  393 avahi       20   0  7196  2944  2176 S   0.7  0.7 10:08.74 ├─ avahi-daemon: running [pihole64.local]                                        
  717 pihole      10 -10 96640 87864  5248 S   0.7 20.2  1h52:42 ├─ /usr/bin/pihole-FTL -f
  847 pihole      10 -10 96640 87864  5248 S   0.0 20.2  0:00.16 │  ├─ ntp-client
  848 pihole      10 -10 96640 87864  5248 S   0.7 20.2  1h51:00 │  ├─ database
  849 pihole      10 -10 96640 87864  5248 S   0.0 20.2  0:14.10 │  ├─ housekeeper    
  850 pihole      10 -10 96640 87864  5248 S   0.0 20.2  0:04.98 │  ├─ dns-client       
  851 pihole      10 -10 96640 87864  5248 S   0.0 20.2  0:25.78 │  ├─ timer       
  852 pihole      10 -10 96640 87864  5248 S   0.0 20.2  0:02.62 │  ├─ civetweb-timer 
  853 pihole      10 -10 96640 87864  5248 S   0.0 20.2  0:01.99 │  ├─ civetweb-master  
  854 pihole      10 -10 96640 87864  5248 S   0.0 20.2  0:07.49 │  ├─ civetweb-worker
  855 pihole      10 -10 96640 87864  5248 S   0.0 20.2  0:07.61 │  ├─ civetweb-worker
  856 pihole      10 -10 96640 87864  5248 S   0.0 20.2  0:10.04 │  ├─ civetweb-worker    
 1312 pihole      10 -10 96640 87864  5248 S   0.0 20.2  0:00.00 │  ├─ ntp-server4
 1313 pihole      10 -10 96640 87864  5248 S   0.0 20.2  0:00.00 │  └─ ntp-server6

@DL6ER
Copy link
Member

DL6ER commented Feb 19, 2025

Could you please run

pihole-FTL sqlite3 /etc/pihole/pihole-FTL.db "pragma integrity_check;"

It will probably need some time (and may require sudo) but if it returns anything else than ok than the slowness is due to errors in the database. Try

sudo systemctl stop pihole-FTL
sudo mv /etc/pihole/pihole-FTL.db /etc/pihole/pihole-FTL.db_
sudo systemctl start pihole-FTL

Does this solve your slowness issues?

@salmundo11
Copy link
Author

salmundo11 commented Feb 19, 2025

pi@pihole64:~$ sudo pihole-FTL sqlite3 /etc/pihole/pihole-FTL.db "pragma integrity_check;"
ok

I'm not seeing high CPU utilization, nor issues with the web interface.

Would it make sense to run this after doing the 6.x update, or is something similar going on in the background, and perhaps the source of the high CPU?

@salmundo11
Copy link
Author

Here is another highish CPU snapshot:

    0[#**                                                          4.0%] Tasks: 29, 19 thr, 109 kthr; 2 running
    1[###                                                          3.9%] Load average: 0.37 0.38 0.23 
    2[##########**                                                16.1%] Uptime: 23:40:12
    3[#####***                                                    10.3%]
  Mem[||||||||||||||||||||#*@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@    126M/426M]
  Swp[|||||||#                                               21.1M/200M]

  [Main] [I/O]
  PID△USER       PRI  NI  VIRT   RES   SHR S  CPU% MEM%   TIME+  Command                                                                          
  393 avahi       20   0  7196  2688  2048 S  24.4  0.6 11:29.19 ├─ avahi-daemon: running [pihole64.local]                                        
  717 pihole      10 -10 95808 87556  5504 S   5.3 20.1  1h56:22 ├─ /usr/bin/pihole-FTL -f
  847 pihole      10 -10 95808 87556  5504 S   0.0 20.1  0:00.17 │  ├─ ntp-client
  848 pihole      10 -10 95808 87556  5504 S   0.0 20.1  1h53:45 │  ├─ database
  849 pihole      10 -10 95808 87556  5504 S   0.0 20.1  0:16.03 │  ├─ housekeeper    
  850 pihole      10 -10 95808 87556  5504 S   0.0 20.1  0:05.35 │  ├─ dns-client       
  851 pihole      10 -10 95808 87556  5504 S   0.0 20.1  0:27.51 │  ├─ timer       
  852 pihole      10 -10 95808 87556  5504 S   0.0 20.1  0:02.83 │  ├─ civetweb-timer 
  853 pihole      10 -10 95808 87556  5504 S   0.0 20.1  0:02.22 │  ├─ civetweb-master  
  854 pihole      10 -10 95808 87556  5504 S   4.0 20.1  0:24.88 │  ├─ civetweb-worker
  855 pihole      10 -10 95808 87556  5504 R   0.7 20.1  0:23.83 │  ├─ civetweb-worker
  856 pihole      10 -10 95808 87556  5504 S   0.0 20.1  0:25.94 │  ├─ civetweb-worker    
 1312 pihole      10 -10 95808 87556  5504 S   0.0 20.1  0:00.00 │  ├─ ntp-server4
 1313 pihole      10 -10 95808 87556  5504 S   0.0 20.1  0:00.00 │  └─ ntp-server6

@Andrei-Dr
Copy link

Andrei-Dr commented Feb 20, 2025

I ended up here through Reddit, where a bunch more people are reporting the same issue - https://www.reddit.com/r/pihole/comments/1iss62l/pihole_v6_extremely_slow_gui_high_cpu_usage/

I really wish major upgrades via pihole -up would at least prompt to continue. Some of us had custom nginx/fpm/doh/dot/vpn setups that got completely hosed.

Same issue here ever since upgrading. As soon as I point clients to it, requests start lagging, intermittent outages, and it pegs a single CPU. All 10.255.255 IP's in the recording are local to pihole, and it spikes on each request.

Screen.Recording.2025-02-20.at.6.00.58.PM.mov

If I move /etc/pihole/pihole-FTL.db, then it doesn't get recreated.
I haven't tried uninstalling/reinstalling it because of time constraints but I suppose that's the only "fix" at the moment?

Here's my debug data https://tricorder.pi-hole.net/hD9JvUof/

@DL6ER
Copy link
Member

DL6ER commented Feb 21, 2025

If I move /etc/pihole/pihole-FTL.db, then it doesn't get recreated.

Why not? Do you have any logs for us?

I haven't tried uninstalling/reinstalling it because of time constraints but I suppose that's the only "fix" at the moment?

No, moving the corrupted database away is what is currently being suggested.

@pralor-bot
Copy link

This issue has been mentioned on Pi-hole Userspace. There might be relevant details there:

https://discourse.pi-hole.net/t/web-interface-slow-after-update-from-5-to-6/76280/17

@MichaIng
Copy link
Contributor

@DL6ER did you already rule out that it can be the database size? Is there a quick way to fill a query db to 1.5 GiB with Pi-hole v6? I'm happy to test a bit on my RPi Zero W.

@DL6ER
Copy link
Member

DL6ER commented Feb 21, 2025

I tested this myself blowing up the database with identical copies of the the same query (only the ID incrementing) to almost 20 GB without any issues. Only when at least one of the indexes is corrupted, we are getting these issues as the database has no other way than reading the entire table and performing a manual search. This seems to be what is happening in virtually all comparable cases I have seen so far. I remains totally unclear to my why the upgrade of sqlite3 from v3.42.0 in v5.25.2 to 3.49.1 in v6.0 causes this. The best guess I have right now is that users (intentionally or not) interrupted some migration steps and sqlite3 didn't recover as it should. We are never manually touching the database file ourselves - everything goes either through the library or the CLI shell command.

@MichaIng
Copy link
Contributor

MichaIng commented Feb 21, 2025

Can you give me a script to blow up a v5 query db? 1.5 GiB is sufficient I guess, so I do not need to raise my test VM's disk size 😅. Maybe I can replicate it when upgrading to v6.

We are never manually touching the database file ourselves - everything goes either through the library or the CLI shell command.

Yeah I thought that maybe the library used by the web UI, or the way the web UI does the query call might somehow have a different result, compared to the CLI call. But I missed that you already found an actual corruption to be the cause.

@DL6ER
Copy link
Member

DL6ER commented Feb 21, 2025

https://github.com/pi-hole/FTL/blob/v5.25.2/test/pihole-FTL.db.sql should generate a minimal v5 database.

@Andrei-Dr
Copy link

Andrei-Dr commented Feb 21, 2025

If I move /etc/pihole/pihole-FTL.db, then it doesn't get recreated.

Why not? Do you have any logs for us?

Idk, it would just say the db file was missing is all.

I haven't tried uninstalling/reinstalling it because of time constraints but I suppose that's the only "fix" at the moment?

No, moving the corrupted database away is what is currently being suggested.

I tried again, this time it recreated, but the file was barely 400MB to begin with and it passed integrity checks:

root@buc-m3:~# pihole-FTL sqlite3 /etc/pihole/pihole-FTL.db "pragma integrity_check;"
ok
root@buc-m3:~# systemctl stop pihole-FTL.service
root@buc-m3:~# mv /etc/pihole/pihole-FTL.db{,.bak}
root@buc-m3:~# systemctl start pihole-FTL.service & tail -fvn0 /var/log/pihole-FTL.log
[1] 87500
==> /var/log/pihole-FTL.log <==
2025-02-21 13:33:39.453 UTC [87522M] INFO: ########## FTL started on buc-m3! ##########
2025-02-21 13:33:39.453 UTC [87522M] INFO: FTL branch: master
2025-02-21 13:33:39.453 UTC [87522M] INFO: FTL version: v6.0.1
2025-02-21 13:33:39.453 UTC [87522M] INFO: FTL commit: 62904aef
2025-02-21 13:33:39.453 UTC [87522M] INFO: FTL date: 2025-02-20 23:07:18 +0000
2025-02-21 13:33:39.453 UTC [87522M] INFO: FTL user: pihole
2025-02-21 13:33:39.453 UTC [87522M] INFO: Compiled for linux/amd64 (compiled on CI) using cc (Alpine 14.2.0) 14.2.0
2025-02-21 13:33:39.456 UTC [87522M] INFO: Wrote config file:
2025-02-21 13:33:39.456 UTC [87522M] INFO:  - 152 total entries
2025-02-21 13:33:39.456 UTC [87522M] INFO:  - 134 entries are default
2025-02-21 13:33:39.456 UTC [87522M] INFO:  - 18 entries are modified
2025-02-21 13:33:39.456 UTC [87522M] INFO:  - 0 entries are forced through environment
2025-02-21 13:33:39.459 UTC [87522M] INFO: Parsed config file /etc/pihole/pihole.toml successfully
2025-02-21 13:33:39.459 UTC [87522M] INFO: PID file does not exist or not readable
2025-02-21 13:33:39.459 UTC [87522M] INFO: No other running FTL process found.
2025-02-21 13:33:39.461 UTC [87522M] INFO: PID of FTL process: 87522
2025-02-21 13:33:39.461 UTC [87522M] INFO: listening on 0.0.0.0 port 58
2025-02-21 13:33:39.461 UTC [87522M] INFO: listening on :: port 58
2025-02-21 13:33:39.461 UTC [87522M] INFO: PID of FTL process: 87522
2025-02-21 13:33:39.461 UTC [87522M] WARNING: No database file found, creating new (empty) database
2025-02-21 13:33:39.561 UTC [87522M] INFO: Database version is 1
2025-02-21 13:33:39.561 UTC [87522M] INFO: Updating long-term database to version 2
2025-02-21 13:33:39.563 UTC [87522M] INFO: Updating long-term database to version 3
2025-02-21 13:33:39.566 UTC [87522M] INFO: Updating long-term database to version 4
2025-02-21 13:33:39.583 UTC [87522M] INFO: Updating long-term database to version 5
2025-02-21 13:33:39.611 UTC [87522M] INFO: Updating long-term database to version 6
2025-02-21 13:33:39.613 UTC [87522M] INFO: Updating long-term database to version 7
2025-02-21 13:33:39.616 UTC [87522M] INFO: Updating long-term database to version 8
2025-02-21 13:33:39.621 UTC [87522M] INFO: Updating long-term database to version 9
2025-02-21 13:33:39.623 UTC [87522M] INFO: Updating long-term database to version 10
2025-02-21 13:33:39.630 UTC [87522M] INFO: Updating long-term database to version 11
2025-02-21 13:33:39.634 UTC [87522M] INFO: Updating long-term database to version 12
2025-02-21 13:33:39.637 UTC [87522M] INFO: Updating long-term database to version 13
2025-02-21 13:33:39.640 UTC [87522M] INFO: Updating long-term database to version 14
2025-02-21 13:33:39.643 UTC [87522M] INFO: Updating long-term database to version 15
2025-02-21 13:33:39.646 UTC [87522M] INFO: Updating long-term database to version 16
2025-02-21 13:33:39.649 UTC [87522M] INFO: Updating long-term database to version 17
2025-02-21 13:33:39.653 UTC [87522M] INFO: Updating long-term database to version 18
2025-02-21 13:33:39.655 UTC [87522M] INFO: Updating long-term database to version 19
2025-02-21 13:33:39.657 UTC [87522M] INFO: Updating long-term database to version 20
2025-02-21 13:33:39.662 UTC [87522M] INFO: Updating long-term database to version 21
2025-02-21 13:33:39.666 UTC [87522M] INFO: Database successfully initialized
2025-02-21 13:33:39.672 UTC [87522M] INFO: Imported 0 queries from the on-disk database (it has 0 rows)
2025-02-21 13:33:39.672 UTC [87522M] INFO: Parsing queries in database
2025-02-21 13:33:39.673 UTC [87522M] INFO: Imported 0 queries from the long-term database

I'll point some more clients at it now and see if it still has outages/sluggish perf

@Andrei-Dr
Copy link

Andrei-Dr commented Feb 21, 2025

If I move /etc/pihole/pihole-FTL.db, then it doesn't get recreated.

Why not? Do you have any logs for us?

Idk, it would just say the db file was missing is all.

I haven't tried uninstalling/reinstalling it because of time constraints but I suppose that's the only "fix" at the moment?

No, moving the corrupted database away is what is currently being suggested.

I tried again, this time it recreated, but the file was barely 400MB to begin with and it passed integrity checks:

root@buc-m3:~# pihole-FTL sqlite3 /etc/pihole/pihole-FTL.db "pragma integrity_check;"
ok
root@buc-m3:~# systemctl stop pihole-FTL.service
root@buc-m3:~# mv /etc/pihole/pihole-FTL.db{,.bak}
root@buc-m3:~# systemctl start pihole-FTL.service & tail -fvn0 /var/log/pihole-FTL.log
[1] 87500
==> /var/log/pihole-FTL.log <==
2025-02-21 13:33:39.453 UTC [87522M] INFO: ########## FTL started on buc-m3! ##########
2025-02-21 13:33:39.453 UTC [87522M] INFO: FTL branch: master
2025-02-21 13:33:39.453 UTC [87522M] INFO: FTL version: v6.0.1
2025-02-21 13:33:39.453 UTC [87522M] INFO: FTL commit: 62904aef
2025-02-21 13:33:39.453 UTC [87522M] INFO: FTL date: 2025-02-20 23:07:18 +0000
2025-02-21 13:33:39.453 UTC [87522M] INFO: FTL user: pihole
2025-02-21 13:33:39.453 UTC [87522M] INFO: Compiled for linux/amd64 (compiled on CI) using cc (Alpine 14.2.0) 14.2.0
2025-02-21 13:33:39.456 UTC [87522M] INFO: Wrote config file:
2025-02-21 13:33:39.456 UTC [87522M] INFO:  - 152 total entries
2025-02-21 13:33:39.456 UTC [87522M] INFO:  - 134 entries are default
2025-02-21 13:33:39.456 UTC [87522M] INFO:  - 18 entries are modified
2025-02-21 13:33:39.456 UTC [87522M] INFO:  - 0 entries are forced through environment
2025-02-21 13:33:39.459 UTC [87522M] INFO: Parsed config file /etc/pihole/pihole.toml successfully
2025-02-21 13:33:39.459 UTC [87522M] INFO: PID file does not exist or not readable
2025-02-21 13:33:39.459 UTC [87522M] INFO: No other running FTL process found.
2025-02-21 13:33:39.461 UTC [87522M] INFO: PID of FTL process: 87522
2025-02-21 13:33:39.461 UTC [87522M] INFO: listening on 0.0.0.0 port 58
2025-02-21 13:33:39.461 UTC [87522M] INFO: listening on :: port 58
2025-02-21 13:33:39.461 UTC [87522M] INFO: PID of FTL process: 87522
2025-02-21 13:33:39.461 UTC [87522M] WARNING: No database file found, creating new (empty) database
2025-02-21 13:33:39.561 UTC [87522M] INFO: Database version is 1
2025-02-21 13:33:39.561 UTC [87522M] INFO: Updating long-term database to version 2
2025-02-21 13:33:39.563 UTC [87522M] INFO: Updating long-term database to version 3
2025-02-21 13:33:39.566 UTC [87522M] INFO: Updating long-term database to version 4
2025-02-21 13:33:39.583 UTC [87522M] INFO: Updating long-term database to version 5
2025-02-21 13:33:39.611 UTC [87522M] INFO: Updating long-term database to version 6
2025-02-21 13:33:39.613 UTC [87522M] INFO: Updating long-term database to version 7
2025-02-21 13:33:39.616 UTC [87522M] INFO: Updating long-term database to version 8
2025-02-21 13:33:39.621 UTC [87522M] INFO: Updating long-term database to version 9
2025-02-21 13:33:39.623 UTC [87522M] INFO: Updating long-term database to version 10
2025-02-21 13:33:39.630 UTC [87522M] INFO: Updating long-term database to version 11
2025-02-21 13:33:39.634 UTC [87522M] INFO: Updating long-term database to version 12
2025-02-21 13:33:39.637 UTC [87522M] INFO: Updating long-term database to version 13
2025-02-21 13:33:39.640 UTC [87522M] INFO: Updating long-term database to version 14
2025-02-21 13:33:39.643 UTC [87522M] INFO: Updating long-term database to version 15
2025-02-21 13:33:39.646 UTC [87522M] INFO: Updating long-term database to version 16
2025-02-21 13:33:39.649 UTC [87522M] INFO: Updating long-term database to version 17
2025-02-21 13:33:39.653 UTC [87522M] INFO: Updating long-term database to version 18
2025-02-21 13:33:39.655 UTC [87522M] INFO: Updating long-term database to version 19
2025-02-21 13:33:39.657 UTC [87522M] INFO: Updating long-term database to version 20
2025-02-21 13:33:39.662 UTC [87522M] INFO: Updating long-term database to version 21
2025-02-21 13:33:39.666 UTC [87522M] INFO: Database successfully initialized
2025-02-21 13:33:39.672 UTC [87522M] INFO: Imported 0 queries from the on-disk database (it has 0 rows)
2025-02-21 13:33:39.672 UTC [87522M] INFO: Parsing queries in database
2025-02-21 13:33:39.673 UTC [87522M] INFO: Imported 0 queries from the long-term database

I'll point some more clients at it now and see if it still has outages/sluggish perf

I'm still experiencing the same sluggish performance once it reaches around 5-30 q/s. I tried checking the ui but it's unbearable when dns queries aren't being served either. It's running on a dual core amd epyc (3ghz) kvm that isn't showing performance issues otherwise. Looking at htop I did catch a short lived zstate thread that keeps coming up every 60s or so, but nothing else stands out aside from the constant 99-100% cpu usage

Image

stracing the process I see ton of/nonstop pread's

Image

Image

The pihole-FTL.db that got created automatically has the following indexes

sqlite> SELECT name, tbl_name, sql
   ...> FROM sqlite_master
   ...> WHERE type = 'index'
   ...> ORDER BY tbl_name, name;
addinfo_by_id_idx|addinfo_by_id|CREATE UNIQUE INDEX addinfo_by_id_idx ON addinfo_by_id(type,content)
client_by_id_client_idx|client_by_id|CREATE UNIQUE INDEX client_by_id_client_idx ON client_by_id(ip,name)
domain_by_id_domain_idx|domain_by_id|CREATE UNIQUE INDEX domain_by_id_domain_idx ON domain_by_id(domain)
forward_by_id_forward_idx|forward_by_id|CREATE UNIQUE INDEX forward_by_id_forward_idx ON forward_by_id(forward)
sqlite_autoindex_network_1|network|
network_addresses_network_id_index|network_addresses|CREATE INDEX network_addresses_network_id_index ON network_addresses (network_id)
sqlite_autoindex_network_addresses_1|network_addresses|
idx_queries_timestamp|query_storage|CREATE INDEX idx_queries_timestamp ON "query_storage" (timestamp)
sqlite> .tables
addinfo_by_id      domain_by_id       network            session
aliasclient        forward_by_id      network_addresses
client_by_id       ftl                queries
counters           message            query_storage
sqlite>

The rest of the strace seems to suggest it's bogging down going through gravity db for non-cached lookups (I'm blocking ~14m domains).

There's a bunch of checks for missing -wal/-journal's

87522 22:25:19 stat("/etc/pihole/gravity.db-wal", 0x7ffe898d0d50) = -1 ENOENT (No such file or directory) <0.000015>
87522 22:25:19 stat("/etc/pihole/gravity.db-journal", 0x7ffe898d0ce0) = -1 ENOENT (No such file or directory) <0.000013>
87522 22:25:19 stat("/etc/pihole/gravity.db-wal", 0x7ffe898d0ce0) = -1 ENOENT (No such file or directory) <0.000013>
87522 22:25:19 stat("/etc/pihole/gravity.db-journal", 0x7ffe898d0c90) = -1 ENOENT (No such file or directory) <0.000037>
87522 22:25:19 stat("/etc/pihole/gravity.db-wal", 0x7ffe898d0c90) = -1 ENOENT (No such file or directory) <0.000013>
87522 22:25:19 stat("/etc/pihole/gravity.db-journal", 0x7ffe898d0c90) = -1 ENOENT (No such file or directory) <0.000013>
87522 22:25:19 stat("/etc/pihole/gravity.db-wal", 0x7ffe898d0c90) = -1 ENOENT (No such file or directory) <0.000013>
87522 22:25:19 stat("/etc/pihole/gravity.db-journal", 0x7ffe898d0c90) = -1 ENOENT (No such file or directory) <0.000013>
87522 22:25:19 stat("/etc/pihole/gravity.db-wal", 0x7ffe898d0c90) = -1 ENOENT (No such file or directory) <0.000013>
87522 22:25:19 stat("/etc/pihole/gravity.db-journal", 0x7ffe898d0c90) = -1 ENOENT (No such file or directory) <0.000013>
87522 22:25:19 stat("/etc/pihole/gravity.db-wal", 0x7ffe898d0c90) = -1 ENOENT (No such file or directory) <0.000012>

And my gravity.db is pretty big

root@buc-m3:~# ls -lah /etc/pihole/gravity*
-rw-rw---- 1 pihole pihole 737M Feb 21 13:29 /etc/pihole/gravity.db

/etc/pihole/gravity_backups:
total 584K
drwxr-xr-x 2 pihole pihole 4.0K Feb 21 13:29 .
drwxr-xr-x 8 pihole pihole  20K Feb 21 19:14 ..
-rw-rw---- 1 pihole pihole 180K Feb 21 13:29 gravity.db.1
-rw-rw---- 1 pihole pihole 180K Feb 20 06:07 gravity.db.2
-rw-rw---- 1 pihole pihole 184K Feb 19 22:53 gravity.db.3

pihole -g -c -d seems to end with an error:


  [i] Building tree...
  [✗] Unable to build gravity tree in /etc/pihole/gravity.db_temp

  [i] If you have a large amount of domains, make sure your Pi-hole has enough RAM available

  [i] Number of gravity domains: 20670140 (14133775 unique domains)
  [i] Number of exact denied domains: 46
  [i] Number of regex denied filters: 62
  [i] Number of exact allowed domains: 130
  [i] Number of regex allowed filters: 19
  [✓] Optimizing database
  [✓] Swapping databases
  [✓] Cleaning up stray matter

  [✓] Done.

I added a 4GB swap file, ran pihole -g -c -d again, it didn't even touch the swap/run low enough on ram to need swap and finished the tree building process. After that though, the gravity.db is 1.5GB meanwhile it was 750MB prior to the run with the same lists/number of domains being blocked:

  [✓] Building tree
  [i] Number of gravity domains: 20670136 (14133775 unique domains)
  [i] Number of exact denied domains: 46
  [i] Number of regex denied filters: 62
  [i] Number of exact allowed domains: 130
  [i] Number of regex allowed filters: 19
  [✓] Optimizing database
  [✓] Swapping databases
  [✓] Cleaning up stray matter

  [✓] Done.
root@buc-m3:~# ls -lah /etc/pihole/gravity*
-rw-rw-r-- 1 pihole pihole 1.5G Feb 21 23:43 /etc/pihole/gravity.db

@Andrei-Dr
Copy link

Adding the unused swap fixed the issue for me, the culprit was the gravity tree build fail. I don't see the reason for having to add swap, when it didn't even get used. It's almost like the process tries to reserve memory and fails, however the system had 1GB free which should have been enough considering it didn't touch the swap that got added. I didn't dig any further into it since I already wasted enough time.

These types of major release updates with possible breaking changes should not be pushed out to the masses with a simple 'pihole -up'. I'm sure plenty of us had custom setups running for years which were now affected, and hours were wasted as a result. Please consider a normal production release cycle for major releases aka leave it up to the end user. Think dist upgrades.

@DL6ER
Copy link
Member

DL6ER commented Feb 22, 2025

I am sorry for the issues. We had a rather long beta period with a pretty large number of participants so we hoped to have covered many special cases. It appears not...

Looking at your strace /pread64 this really seems to be all sqlite3 stuff. That the sqlite3 library changed that much was likewise unexpected as it is usually known to be a very stable library and there were no hints at all during the beta indicating something even remotely similar... At least I am glad it is now fixed for you. Sorry again for the trouble. We don't expect similar major upgrades in the near- to mid-future.

@mrmuiz
Copy link

mrmuiz commented Feb 22, 2025

Image

@DL6ER
Copy link
Member

DL6ER commented Feb 22, 2025

Are you sure this all comes from pihole-FTL alone?

@mrmuiz
Copy link

mrmuiz commented Feb 22, 2025

Are you sure this all comes from pihole-FTL alone?

All containers stopped only Pihole running.

@Andrei-Dr
Copy link

Andrei-Dr commented Feb 22, 2025

I am sorry for the issues. We had a rather long beta period with a pretty large number of participants so we hoped to have covered many special cases. It appears not...

Looking at your strace /pread64 this really seems to be all sqlite3 stuff. That the sqlite3 library changed that much was likewise unexpected as it is usually known to be a very stable library and there were no hints at all during the beta indicating something even remotely similar... At least I am glad it is now fixed for you. Sorry again for the trouble. We don't expect similar major upgrades in the near- to mid-future.

Thank you for all the hard work put into this project! Any thoughts on the tree build fail without swap? This also seems to be a commonly reported issue now, and it was the direct reason for my cpu pegged FTL

@DL6ER
Copy link
Member

DL6ER commented Feb 22, 2025

I can only assume that it is used for a very brief moment. Maybe it is a bug in sqlite3, the problem is that it is super-hard to debug when we cannot reproduce it locally. I tried with a VM having 256 MB of memory and hit a out-of-memory crash/killing of sqlite3 by the kernel, but never a high CPU load. With swapping, I have seen a high CPU load for a brief moment but then gravity finished, releasing the CPU back to idle state. Seems we are not really there, yet. I could not be happier than being able to reproduce this myself so I could deliver a satisfying solution to you guys....

@MichaIng
Copy link
Contributor

MichaIng commented Feb 22, 2025

I also could not reproduce it, but I was also not able to create a large query database yet. Is there an easy way to blow it up quickly to 1.5 GiB? I tried running getent ahosts loops on all cores in parallel, using Pi-hole DNS on the host, and disabled rate limiting. But it takes hours to even reach 100 MiB database size with several million queries 😄. I guess an sqlite query or loop would be much more effective, but I am not certain enough to know how to do that a way that the database remains valid for Pi-hole.

@Andrei-Dr
Copy link

I can only assume that it is used for a very brief moment. Maybe it is a bug in sqlite3, the problem is that it is super-hard to debug when we cannot reproduce it locally. I tried with a VM having 256 MB of memory and hit a out-of-memory crash/killing of sqlite3 by the kernel, but never a high CPU load. With swapping, I have seen a high CPU load for a brief moment but then gravity finished, releasing the CPU back to idle state. Seems we are not really there, yet. I could not be happier than being able to reproduce this myself so I could deliver a satisfying solution to you guys....

I only seemed to run into the pegged cpu issue when gravity didn't finish building its tree, and there were 5-30 concurrent queries happening for "new records". Note I had ~14m domains in gravity. If you can skip the tree build then run something like for i in {1..50};do dig @1.2.3.4 example${i}.com & done that should do it

@tipp88
Copy link

tipp88 commented Feb 23, 2025

What is the solution to this now? I have restored v5 from Backup, the v6 is unusable due to high CPU load....

@MichaIng
Copy link
Contributor

MichaIng commented Feb 23, 2025

If CPU load is again high after the update:

sudo systemctl stop pihole-FTL
sudo mv /etc/pihole/pihole-FTL.db /etc/pihole/pihole-FTL.db.bak
sudo systemctl start pihole-FTL

You can also remove the old database, if the old query logs are not important for you:

sudo rm /etc/pihole/pihole-FTL.db.bak

@Crypto-Spartan
Copy link

Crypto-Spartan commented Feb 25, 2025

Running piholev6 (fresh install) on a raspi zero. Had no issues with running previous pihole versions on this pi, have done it since at least 2020. according to htop, /usr/bin/pihole-FTL -f was using a constant 95-99% of the CPU. I ran the following:

sudo systemctl stop pihole-FTL
sudo mv /etc/pihole/pihole-FTL.db /etc/pihole/pihole-FTL.db.bak
sudo systemctl start pihole-FTL

Immediately, the CPU shot back up to 95-99% and continues to stay there.

Image

Edit:

I realized that my 8GB SD card could've been problematic here, so I installed a fresh instance of dietpi on a brand new 128GB SD card. Installed pihole fresh. Restored from backup via teleporter, then ran pihole -g. Even with a 4GB swap file, it is still failing to build the tree:

Image

Image

And of course, CPU usage % remains pinned at >95% due to /usr/bin/pihole-FTL -f

Edit 2:

I attempted to increase the swapfile to 12GB then re-run the sqlite index creation, but got the same memory error:

$ free -h
               total        used        free      shared  buff/cache   available
Mem:           475Mi        29Mi       436Mi          0B        21Mi       446Mi
Swap:           11Gi        16Mi        11Gi
$ sudo service pihole-FTL stop
$ cd /etc/pihole
$ cp gravity.db gravity.db.bak
$ pihole-FTL sqlite3 -ni "gravity.db" "CREATE INDEX idx_gravity ON gravity (domain, adlist_id);"
Error: stepping, out of memory (7)

The maximum swap usage during the index creation attempt was ~2.5GB, nowhere near 12GB (I watched it the whole time via htop). At the time this error was shown, regular memory usage was ~120MB and swap was 800-900MB. I have no idea how to get this index created at this point but I know it's necessary to get CPU usage down.

Edit 3:

I copied gravity.db to another system and ran CREATE INDEX idx_gravity ON gravity (domain, adlist_id); via sqlite3 cli. This doubled the size of the database (2.3GB > 4.6GB). I then transferred gravity.db back to the raspi zero (& verified checksums to be safe), and ran sudo systemctl pihole-FTL start. CPU usage dropped drastically, it's now around ~1-4%.

Image

While this fixed my problem for now, I sincerely hope this issue is temporary. It seems that I'd have to re-do these steps after every weekly gravity update which is far from ideal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests