Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WARN beater/o365beat.go:249 start XX must be <=YY hrs ago, resetting #47

Open
rob570 opened this issue May 24, 2020 · 3 comments
Open
Labels
bug Something isn't working

Comments

@rob570
Copy link

rob570 commented May 24, 2020

Hi,
could anyone please explain me what happens in such case?
This is the scenario:
Assume I'm asking all events prior to YY hours, let's say 24h and now are 8:00:00
o365beats pulls a request with start time 8:00 of yesterday.
But the response arrive at 8:05:00 five minute later the request (my tenant is really busy)
Then I've got this warning. I suppose because an event starting at 8:00:00 of yesterday it's older than 8:05:00 minus 24h.
Do I loss all the events or only the events older than 24h?
Because I'm having a lot of this warning and a huge number of duplicate Id, and I'm guessing if it's related to this case.
Thank you for clarifyng me,
Roberto

@chris-counteractive
Copy link
Collaborator

Thank you for the issue, Roberto, this is definitely an case that might require some work to account for how the API is designed (the endpoint for listing available content only supports asking for a <= 24hr window, and no more than 7 days in the past.

TL;DR: In the short term I recommend increasing the period setting in your config file from the default 5 minutes to something like 30m or 60m to give the beat more time to ship logs before it polls for more, while we look into a more durable fix and/or clearer error messages.

The message you're seeing (start xx must be <= yy hrs ago, resetting) does not drop all the events for that span, it just resets the start of the window to the oldest content available (by default 7 days, the max the API supports, or configurable under the content_max_age key). So you'd only lose events outside that content window, in your case, the 5 minutes. In most cases this only happens when the beat is first started, accounting for any small deviation in the time of execution and 7 days ago by Microsoft's reckoning ... most of the time the beat quickly gets into newer content locations and is never on that 7-day boundary again (unless you shut down the beat for a while). In your case, however, you may be seeing it more often because the beat "can't keep up" with your busy tenancy and can't ship all the events it pulls down before it polls again, so it grabs the same batch (or an overlapping batch), which causes your duplication problem.

I'm going to tag this as a bug, because there is a disconnect between the time of each event and the time content becomes available, and retrieving content blobs on a boundary may lead to duplicate events. And the error messages don't clarify why you're in the situation you're in. There are a few things we may need to fix here, including adding a configuration setting to pull less than 24 hours worth of available content (currently hard-coded).

In the short term I recommend increasing the period setting in your config file from the default 5 minutes to something like 30m or 60m to give the beat more time to ship logs before it polls for more. Thanks again!

@rob570
Copy link
Author

rob570 commented May 27, 2020

Thank you Chris for your beat and your code so well documented.
I've changed line 314 of o365beat.go from:
interval := 24 * time.Hour
to
interval := 2 * time.Hour
and it's working better even if I continue to receive more than one published event log like:

2020-05-27T08:09:33.197+0200    DEBUG   [beat]  beater/o365beat.go:366  publishing 46 event(s)
2020-05-27T08:09:33.203+0200    DEBUG   [beat]  beater/o365beat.go:418  published blob created 2020-05-21 03:41:37.619 +0000 UTC, last was 2020-05-21 03:41:37.619 +0000 UTC, updating registry
2020-05-27T08:09:33.203+0200    DEBUG   [beat]  beater/o365beat.go:447  putting registry info (2020-05-21 03:41:37.619 +0000 UTC) to /etc/o365beat/o365beat2.state
2020-05-27T08:09:33.378+0200    DEBUG   [beat]  beater/o365beat.go:366  publishing 46 event(s)
2020-05-27T08:09:33.381+0200    DEBUG   [beat]  beater/o365beat.go:418  published blob created 2020-05-21 03:41:37.619 +0000 UTC, last was 2020-05-21 03:41:37.619 +0000 UTC, updating registry
2020-05-27T08:09:33.381+0200    DEBUG   [beat]  beater/o365beat.go:447  putting registry info (2020-05-21 03:41:37.619 +0000 UTC) to /etc/o365beat/o365beat2.state
2020-05-27T08:09:33.630+0200    DEBUG   [beat]  beater/o365beat.go:366  publishing 12 event(s)
2020-05-27T08:09:33.630+0200    DEBUG   [beat]  beater/o365beat.go:418  published blob created 2020-05-21 03:41:51.441 +0000 UTC, last was 2020-05-21 03:41:37.619 +0000 UTC, updating registry
2020-05-27T08:09:33.630+0200    DEBUG   [beat]  beater/o365beat.go:447  putting registry info (2020-05-21 03:41:51.441 +0000 UTC) to /etc/o365beat/o365beat2.state
2020-05-27T08:09:33.854+0200    DEBUG   [beat]  beater/o365beat.go:366  publishing 12 event(s)
2020-05-27T08:09:33.854+0200    DEBUG   [beat]  beater/o365beat.go:418  published blob created 2020-05-21 03:41:51.441 +0000 UTC, last was 2020-05-21 03:41:51.441 +0000 UTC, updating registry
2020-05-27T08:09:33.854+0200    DEBUG   [beat]  beater/o365beat.go:447  putting registry info (2020-05-21 03:41:51.441 +0000 UTC) to /etc/o365beat/o365beat2.state

But now, at least registry is updating and records are flowing.
On the Elasticsearch side I solved setting the document Id to the Id given from Microsoft.
Thank you again for your support,
Roberto

P.S. In the past day I tried modifying poll time (even to 600m) but it didn't helped. I tryed to sync one container (Audit.Exchange, Audit.AzureActiveDirectory...) at time too with poor results

@chris-counteractive
Copy link
Collaborator

Roberto - so glad you were able to make some progress, and thank you for the additional feedback and clarity. It's going to take a bit to work through what's happening here, I'm not sure exactly why changing that interval produced such a benefit, but I'm glad it did and will get to an answer soon. I'll let you know when I have a durable fix that avoids the duplication you're seeing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants