-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TRACKING] Create a more coherent stats model for libs and consumer #1463
Comments
@Andreagit97 answering here re #1433 (comment)
The Falco client shall continue to support (1) a simple file sink and (2) the internal rule (JSON formatted) and (3) a prometheus exporter was requested as additional option, but it's not yet on the concrete roadmap, maybe Falco 0.38? As infrastructures / environments are very complex and vary among adopters I believe each of these option is valid and needed. Personally I have experienced that Prometheus only will not solve all of your problems when looking at monitoring very large and diverse environments holistically and the "bring your own metrics" over the internal Falco rule can shine in terms of simplicity and ease of integration into the existing alert output sink that you have to setup anyways if you use Falco (especially non Kubernetes environments). That being said if the Prometheus struct will have advantages why not? Furthermore, I think the current scap stats v2 schema is not bad and we would be able to convert to Prometheus easily. @Andreagit97 since you brought this up I was actually already thinking about How To for Falco. Let me attempt to share some more thoughts. On the one hand we have already decided that we want the Prometheus exporter, on the other hand we have also experienced firsthand that such items are a significant LOE and may be only needed or relevant for some time until the next trend arises (see for example the old k8s client or mesos etc) -- we would not know beforehand. And maybe instead of a full client we can create a solution that pushes some JSON we already have up (I have to first explore options and learn what's possible). In summary, I think we should discuss our output sinks in general before rushing into decisions as we have to maintain them as well (more maintenance overhead than our current internal metrics btw) and all we want is some custom metrics to check if Falco goes wonky, it's not our tools primary mission to emit metrics. Maybe @leogr you would have additional thoughts? Let's also checkout some related projects:
OpenTelemetry should be on our radar as well and lastly for example the jaeger project changed approaches frequently wrt to the output sinks. |
There was an additional comment by @Andreagit97 #1433 (review) First the pain of scap <-> sinsp separation that doesn't really apply to the stats / metrics anyways becomes very evident. Perhaps significant renaming can help for sure. And in addition yes that Therefore, to reiterate the patch in #1433 is an intermediary step only that basically extends the buffer that was initially only created for the sinsp resource utilization metrics. |
Uhm got it, they are all valid points, thank you for pointing them out!
In general, I think that we need to involve more the community in these choices, the ones listed above are just my opinions but maybe they are completely wrong... one idea could be to post somewhere a question like "How would you like to collect Falco metrics? A prometheus exporter? a custom rule? or whatever" WDYT?
I'm in favor of an intermediary patch to expose new metrics what I was just saying in the PR was that IMO we need to start to have a little bit clearer what we want to expose from libs and in which form, we have many metrics in many different places and it starts to become difficult to not introducing bugs or to expose stable APIs for libs users. So the idea was, it's ok to have other metrics but let's try to find a glue between all the things we want to expose. I will try to play a little bit with the actual methods we have, to see if we can find a way to collect them all under a unique place. The aim is still to have it in 0.14.0 I will try to propose some possible approaches in the next few days. If you are on PTO I can try to lead the workflow, if you are fine with that of course. Let's see what we can do :) |
[1] I always encourage collaboration with SREs to gain insights into their day-to-day approaches. For instance, writing to (log rotated) files and subsequently monitoring them for log transport options is a well-established practice in infrastructure management. [2] Prometheus: There may be more options to support a Prometheus exporter than implementing a full Prometheus client or adding a new dependency to the project. For instance, Prometheus also supports push, not just pull, and /metrics could be exposed via atomically writing to a file with a POM-compatible string per line containing the metrics. If this approach is suitable for Falco, it would be fast, easy, and low-maintenance. Given that these potential approaches are based on some informal discussions, I would recommend conducting initial experiments. [3] I understand that for example scap has undergone numerous refactoring efforts this year. However, transparent plans outlining these changes were not shared or discussed beforehand. I'm curious about the rationale behind treating metrics differently in this refactoring. Proposing that we move forward with the current PR, deploy and test it in production, and gather real-world data to inform our decision-making process. This data will help us ensure that the project's best interests are met. I'm eager to collaborate with the team in December and beyond to develop a more cohesive stats model that aligns with our overall objectives. I recognize that the current PR involves significant cleanup of previous metrics code, which is a valuable contribution. Deferring the implementation of a comprehensive stats model until we have gathered more data will allow us to make more informed decisions and avoid overburdening this PR. |
Issues go stale after 90d of inactivity. Mark the issue as fresh with Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with Provide feedback via https://github.com/falcosecurity/community. /lifecycle stale |
/remove-lifecycle stale |
More ideas for possible follow up PRs by @FedeDP: |
Tackled second point here: #1745 |
/milestone 0.15.0 |
@incertum: The provided milestone is not valid for this repository. Milestones in this repository: [ Use In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/milestone 0.17.0 |
As stated in the most recent libs stats refactor/extension PR #1433: #1433 (comment), I am creating this issue to track future refactors that, instead of adding more metrics that we urgently need, aim to create a new, more coherent stats model (software interface/API) across scap, lisinsp, and consumers such as Falco.
CC @jasondellaluce
The text was updated successfully, but these errors were encountered: