Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow startup - blob cache #14569

Open
Chaz27 opened this issue Oct 23, 2024 · 8 comments
Open

Slow startup - blob cache #14569

Chaz27 opened this issue Oct 23, 2024 · 8 comments
Labels
Windows Anything related only to Windows OS

Comments

@Chaz27
Copy link

Chaz27 commented Oct 23, 2024

INFO filesystem: Blob filesystem cache warm-up complete. elapsed=6m9.5552129s

Is there a way to speed up this process? Some flags or something I can experiment with?

Windows with version 5.1.2

@prestonvanloon
Copy link
Member

@Chaz27 can you tell us more about your system? Are you using a SSD? Which one?

When was this node first synced? It sounds like the filesystem was quite slow as prysm traversed the blob storage to remove old blobs.

@Chaz27
Copy link
Author

Chaz27 commented Oct 23, 2024

@prestonvanloon

SSD is a 2TB Samsung 980 pro. I believe the node was first synced in Nov 2022.

I ran a couple more tests today. Rebooting the node after about 12 hours of running resulted in the same issue:

INFO filesystem: Blob filesystem cache warm-up complete. elapsed=6m18.895584s

I let the node sync fully, then rebooted it after about 10 seconds and got a totally different result:

INFO filesystem: Blob filesystem cache warm-up complete. elapsed=3.4767708s

So it's some kind of clean up after the node has been running for a while? Looking at the SSD performance, it sits at about 11% (20mb/sec) read when it takes that 6min load time.

@kasey
Copy link
Contributor

kasey commented Oct 24, 2024

@Chaz27 to make sure I understand - in these two instance when you use the term reboot:

  • *Rebooting* the node after about 12 hours of running
  • I let the node sync fully, then *rebooted* it after about 10 seconds

Does reboot here mean reboot/restart the computer/Windows, or restarting the beacon node program? If it's the latter, I'm guessing the second fast start could be due to Windows caching filesystem data in memory.

@Chaz27
Copy link
Author

Chaz27 commented Oct 24, 2024

@kasey sorry, both times was simply shutting the beacon node down and starting it again. No reboot of windows in either case.

I did another test just now after my previous comment which was ~2 hours ago. Same result. Clean shutdown of beacon node console window. Load beacon node, ~6min cache warmup. Run for a couple of slots after synced to head, clean shutdown again. Load beacon node, ~3 second cache warm up.

@prestonvanloon
Copy link
Member

@Chaz27 This might be windows specific behavior. Unfortunately, we are very limited with our knowledge of windows systems. We believe there may be something with the filesystem cache. Perhaps there is something we can do differently in Prysm for windows, but we would need an external contributor with knowledge of windows systems to be able to understand the root cause of the problem and propose a reasonable solution.

I found this page a bit helpful: https://learn.microsoft.com/en-us/windows/win32/fileio/file-caching

I don't think there is much that you can do differently... unless there is some windows setting?? Not sure, sorry!

@Chaz27
Copy link
Author

Chaz27 commented Oct 25, 2024

@prestonvanloon No worries, how much time is normal for cache warm up on Linux?

@prestonvanloon
Copy link
Member

@Chaz27 I just restarted my personal beacon node.

Oct 24 21:34:25 beacon-chain prysm.sh[60213]: time="2024-10-24 21:34:25" level=info msg="Blob filesystem cache warm-up complete." elapsed=50.003637232s prefix=filesystem  

This machine had 85213 directories in the blob folder.

@nisdas nisdas added the Windows Anything related only to Windows OS label Oct 25, 2024
@Chaz27
Copy link
Author

Chaz27 commented Oct 25, 2024

Ok I found the issue after attempting to create a quick C# console app to replicate the issue. It was windows defender. I excluded the blob directory in the scan settings are got a much better result:

INFO filesystem: Blob filesystem cache warm-up complete. elapsed=32.5777581s

Now, does this open me up to any viruses coming through blobs? 😆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Windows Anything related only to Windows OS
Projects
None yet
Development

No branches or pull requests

4 participants