Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resident memory usage grows significantly over days of uptime #682

Open
jtroup opened this issue Dec 11, 2020 · 35 comments
Open

Resident memory usage grows significantly over days of uptime #682

jtroup opened this issue Dec 11, 2020 · 35 comments

Comments

@jtroup
Copy link

jtroup commented Dec 11, 2020

Matterhorn's base memory usage seems to be 1Tb Virt / 300Mb resident. After ~5 days of uptime, that had grown to 1821Mb resident, making it the largest single consumer of resident memory on my machine. Is there anyway to diagnose where that memory is going?

(Thankfully because Matterhorn is stateless, I can simply restart Matterhorn every couple of days to mitigate this.)

@jtdaugherty
Copy link
Member

We can do memory profiling to get a sense for what is causing the memory usage. Are you in a position to build from source? If so, I can give you instructions for building and running with memory profiling enabled.

@jtroup
Copy link
Author

jtroup commented Dec 11, 2020

I've not built from source before, but happy to try :-)

@jtdaugherty
Copy link
Member

If you want to try, here are the steps:

https://github.com/matterhorn-chat/matterhorn/blob/master/docs/BUILDING.md

@jtdaugherty
Copy link
Member

@jtroup any progress with doing a build? When you get that working, then we can adjust the build process to enable profiling.

@jtroup
Copy link
Author

jtroup commented Dec 28, 2020

Sorry, this fell off my radar. Spinning up a lxd to try a build now.

@jtroup
Copy link
Author

jtroup commented Dec 28, 2020

OK, I've got matterhorn built from source and running.

FWIW, the build from source instructions were generally good, but I did have to manually install the zlib1g-dev Ubuntu package as without it, I got this:

https://paste.ubuntu.com/p/NR3wFBmdf2/

This is with a fresh Ubuntu Focal (20.04) lxd container, FWIW.

@jtdaugherty
Copy link
Member

Thanks, I've updated the build instructions.

To build with profiling enabled:

  1. Edit build.sh and change the last line so that it reads
    cabal new-build -j --enable-tests --enable-profiling
    
  2. Run ./build.sh

To run and generate a profile for us to check out:

  1. Run ./run.sh +RTS -h
  2. Once matterhorn has been running long enough to take up memory as you described, quit Matterhorn and let me know here by commenting and attaching the resulting matterhorn.hp file that gets produced when Matterhorn exits.

@jtroup
Copy link
Author

jtroup commented Dec 29, 2020

OK, that's now running:

1001000  1980655  101  1.1 1074518176 240320 pts/1 Sl+ 00:18   0:11                  \_ /home/ubuntu/matterhorn/dist-newstyle/build/x86_64-linux/ghc-8.8.4/matterhorn-50200.11.0/x/matterhorn/build/matterhorn/matterhorn +RTS -h

I'll leave it a few days and report back as requested.

@jtdaugherty
Copy link
Member

Thank you!

@jtroup
Copy link
Author

jtroup commented Jan 4, 2021

I left matterhorn largely unused since then and it grew to over 900Mb resident memory.

The resulting matterhorn.hp file is 17Gb big; I've compressed it using zstd and uploaded it here:

https://people.canonical.com/~james/nx/matterhorn.hp.zst

@jtdaugherty
Copy link
Member

Thank you!

@jtdaugherty
Copy link
Member

@jtroup How would you characterize the channel activity over the time period of this profile? Where the channels idle, somewhat active, extremely active, etc.?

@jtroup
Copy link
Author

jtroup commented Jan 4, 2021

@jtroup How would you characterize the channel activity over the time period of this profile? Where the channels idle, somewhat active, extremely active, etc.?

@jtdaugherty Mostly idle - our company was in its end-of-year shutdown period.

@jtdaugherty
Copy link
Member

Okay, thanks!

@jtdaugherty
Copy link
Member

Would you be willing to run Matterhorn again to generate a retainer profile? That can be done with the following command in your profiling-enabled build:

./run.sh +RTS -hr

The resulting matterhorn.prof file is what we'll need.

@jtroup
Copy link
Author

jtroup commented Jan 5, 2021

Sure running this now - do you need it after a similar amount of uptime or does it not matter?

@jtdaugherty
Copy link
Member

The amount of time is less important than the growth in residency, so whatever runtime yields that will be helpful.

@jtroup
Copy link
Author

jtroup commented Jan 5, 2021

matterhorn.prof.gz

The above is from a matterhorn session that grew from 353372 to 522480 resident. Let me know if you need anything more.

@jtdaugherty
Copy link
Member

Did that run of Matterhorn also yield a matterhorn.hp file (with the same modification time as the .prof file)?

@jtdaugherty
Copy link
Member

(If so, I'll need that as well.)

@jtroup
Copy link
Author

jtroup commented Jan 5, 2021

@jtdaugherty
Copy link
Member

Thanks! I didn't realize I'd need it either. 😊

@jtdaugherty
Copy link
Member

@jtroup Would it be possible for you to generate another profile with +RTS -h over just one hour?

@jtroup
Copy link
Author

jtroup commented Jan 13, 2021

Sure:

https://people.canonical.com/~james/nx/matterhorn_1hr_RTS-h.hp.gz

Unfortunately, I got stuck on a call, so it ended up being 90m, not 60m. Lemme know if you want me to rerun more accurately.

@jtroup
Copy link
Author

jtroup commented Jun 29, 2021

FWIW this continues to be a problem for me. I just restarted Matterhorn after 36 hours and it was already at almost 1.2Gb resident. This may be psychosomatic, but it feels like if I'm using Matterhorn more (i.e. doing a lot of channel switching via M-a) the memory usage seems to grow faster. Not sure if that helps at all.

@jtdaugherty
Copy link
Member

Thanks for your update, @jtroup - I'm sorry to hear this is still a problem, and I know we have not had resources to dedicate to it yet on our end.

@hegga
Copy link

hegga commented Sep 21, 2021

Any updates on this? I am also experiencing significant memory usage by matterhorn. Thanks for a great tool by the way! 👏

@jtdaugherty
Copy link
Member

Hi @hegga, we know this may still affect some users. We have not been able to reproduce the problem in our own environments yet, and lately we have had very limited resources to dedicate to the problem. I'd be delighted to work with any community members with Haskell profiling experience who want to help explore this problem. In the mean time, I recommend restarting Matterhorn when necessary (rather than, say, running it indefinitely on a server as one may be used to doing with IRC clients).

@hegga
Copy link

hegga commented Sep 27, 2021

Hi @jtdaugherty - I don't think I fit the description "members with Haskell profiling experience", but I have compiled matterhorn on Debian, and will be able to execute commands on instructions if that helps?

@jtdaugherty
Copy link
Member

Thanks @hegga, I'll keep that in mind. For some kinds of tests that may be all that you'd need to do to help out. Before that, though, we'll need to spend more time directly modifying the code and using Haskell profiling tools.

@tjaalton
Copy link

I'm suffering from this as well, and am sharing the same server as jtroup so maybe the issue is there?
mattermost/mattermost#8297 is a server bug that got closed a few years ago

5GB after two weeks of uptime, seems to grow when switching between channels for me

@setharnold
Copy link

Can you suggest if matterhorn has any "easy" memory allocation tunables?

mallopt(3) describes glibc's malloc(3) controls that can improve performance for specific workloads. Does mattermost, compiled on Linux, use the standard C library malloc(3) routine for most of its memory allocations? Or does it use eg jemalloc or mimalloc? Or is all the memory allocation done via syscalls from Haskell packages?

I wonder if there's some easy way workaround for the memory consumption by just changing environment variables or preloading a different malloc implementation.

I can't realistically help debug Haskell memory management -- but I can change environment variables or preload libraries and see how it goes.

Thanks

@jtdaugherty
Copy link
Member

@setharnold since Matterhorn is written in Haskell and is compiled with GHC, it uses the GHC runtime system's memory allocator and garbage collector. So there's an awful lot of technology in between Matterhorn and malloc. Matterhorn itself doesn't offer any settings related to memory management, although the underlying RTS has some settings for allocation and the garbage collector. Matterhorn can be compiled in a way that exposes those settings to the command-line interface for Matterhorn, so folks who understand those settings could use them to influence how Matterhorn behaves.

With that said, I have never been able to reproduce the problem described in this ticket and I have not observed it. That makes it very hard to debug, and that certainly bothers me - I know that this is a real thing people are experiencing, but it's difficult for me to even form a hypothesis about why it is happening. Matterhorn doesn't do anything to expunge the data it keeps around when it runs for a long time, but that's because we aren't talking about that much data, even for a very active server. Even that kind of use shouldn't cause the memory behavior described here, so I'm really at a loss as to what is going on. Add to that the fact that the Haskell runtime system and garbage collector can be a bit difficult to tame when it comes to memory use in a situation where there's undesirable behavior like this; even so, it's usually fixable if one can reproduce the problem.

@setharnold
Copy link

@jtdaugherty cool, thanks for the threads to follow up. The machinery is certainly more involved than I expected but the upside is the RTS appears to have loads of potentially useful knobs.

Could you enable the RTS argument parsing in a future build? I'd like to try a few of these. (GHCRTS='-M2G' looks particularly promising. I don't know if two gigs is really enough but it feels like a good starting point.)

Thanks

@jtdaugherty
Copy link
Member

@setharnold enabling RTS options isn't something I'd do for a regular release. That's best reserved for development and debugging builds. If you want to build Matterhorn yourself then I'd be happy to provide instructions to get them enabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants