Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do research on telemetry collectors and how we can use them, and if we are missing to send some events #2649

Open
Tracked by #2546
dborovcanin opened this issue Jan 17, 2025 · 1 comment
Assignees

Comments

@dborovcanin
Copy link
Collaborator

No description provided.

@github-project-automation github-project-automation bot moved this to ⛏ Backlog in SuperMQ Jan 17, 2025
@felixgateru felixgateru moved this from ⛏ Backlog to 🚧 In Progress in SuperMQ Jan 20, 2025
@felixgateru felixgateru moved this from 🚧 In Progress to 🩺 Review and testing in SuperMQ Jan 27, 2025
@dborovcanin dborovcanin moved this from 🩺 Review and testing to ⛏ Backlog in SuperMQ Jan 29, 2025
@felixgateru felixgateru moved this from ⛏ Backlog to 🚧 In Progress in SuperMQ Jan 30, 2025
@felixgateru
Copy link
Contributor

felixgateru commented Feb 6, 2025

Client Telemetry Processing and Aggregation

Current Telemetry Fields

Client telemetry includes the following fields:

  • First Seen: Updated at client creation.
  • Last Seen: Updated when the client publishes or subscribes to a topic.
  • Inbound Messages: Updated when a client publishes a message.
    • Increases the count of inbound messages for the publishing client.
  • Outbound Messages: Updated when a client publishes a message.
    • Increases the count of outbound messages for each client subscribed to the publisher’s topic.
  • Subscriptions: Managed via subscribe and unsubscribe events from messaging.

Storage and Querying

Currently handled by Postgres.


Investigated Alternatives

1. Prometheus

Prometheus stores all data as time series, a stream of timestamped values for the same metric.

Integration Process

  1. Expose Metrics Endpoint: Metrics must be exposed via an HTTP endpoint for Prometheus to scrape.
  2. Define Metrics in Prometheus: Each telemetry field is defined as a Prometheus metric type.
  3. Scrape Configuration: Prometheus is configured to scrape the service for the metrics.

Prometheus Metrics Mapping

Client Telemetry Field Prometheus Metric Type
inbound_messages Counter
outbound_messages Counter
first_seen Gauge
last_seen Gauge
subscriptions Gauge

Go Integration

Prometheus provides a Go client library for defining metrics.

Querying

Prometheus queries are written in PromQL. It supports selecting and aggregating time series data.

Advanatges

  1. Supports real time monitoring and alerting
  2. Will support high cardinality metrics

Disadvantages

  1. Lacks long term data storage so will require maintaining Postgres instane.
  2. Eventual consistency
  3. Lacks relational queries so aggregation may be difficult

References


2. Elasticsearch

Elasticsearch stores data in the form of JSON documents, making schema design straightforward.

Suggested Schema

{
  "mappings": {
    "properties": {
      "client_id": { "type": "keyword" },
      "domain_id": { "type": "keyword" },
      "inbound_messages": { "type": "long" },
      "outbound_messages": { "type": "long" },
      "first_seen": { "type": "date" },
      "last_seen": { "type": "date" }
    }
  }
}

Integration Process

  1. Defining the data structure
  2. Defining the data structure
  3. Setting up Elasticsearch
  4. Data migration
  5. Setting up API calls to elastic
  6. Setting up queries for aggregation
  7. Exporting data for analytics

Querying

Elasticsearch uses Query DSL (based on JSON) for defining queries. These queries can be exposed as endpoints.

Advantages

  1. Powerful search and aggregations
  2. Time series data support
  3. Near real time data availability

Disadvantages

  1. Flagged as resource intensive
  2. Eventually consistent vs strongly consistent Postgres

3. Apache Druid

Real time analytics database designed for fast slice and dice analytics large datasets. The list of common application areas does include what we are trying to achieve.

Suggested Schema

Similar to Elasticsearch

Integration Process

  1. Defining the data structure for telemetry fields in Druid
  2. Setting up Apache Druid
  3. Setting up an ingestion method, supports both real time streaming and batch ingestion
  4. Setting up update and insert queries
  5. Exporting data for analytics

Advantages

1.Flexible ingestion with realtime and batch ingestion
2. Offers flexible queries with targeted optimization for aggregation and reporting queries.
3. Support for high cardinality data columns

Disadvantages

  1. Not suitable for low latency updates of existing records using a primary key.

Reference Links

  1. https://medium.com/@reshra3893/a-beginners-guide-to-elasticsearch-queries-d0205512de2d
  2. https://prometheus.io/docs/prometheus/latest/querying/basics/
  3. https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html
  4. https://druid.apache.org/docs/latest/design/

@felixgateru felixgateru moved this from 🚧 In Progress to 🩺 Review and testing in SuperMQ Feb 6, 2025
@felixgateru felixgateru moved this from 🩺 Review and testing to 🚧 In Progress in SuperMQ Feb 6, 2025
@felixgateru felixgateru moved this from 🚧 In Progress to ⛏ Backlog in SuperMQ Feb 7, 2025
@dborovcanin dborovcanin self-assigned this Feb 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ⛏ Backlog
Development

No branches or pull requests

2 participants