Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add formatting pattern support #151

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

gpop63
Copy link
Contributor

@gpop63 gpop63 commented Jun 14, 2024

Overview

This PR introduces the capability to generate field values in a specific format.

A set of standard pattern generators are added: ipv4, ipv6, port and string. Regex is used to identify formatting patterns in the field value, which must conform to the {generator} format.

Example:

- name: hostIP
  cardinality: 25
  formatting_pattern: "{ipv4}:{port}"

Test with actual config

configs.yml

fields:
  - name: cloud.region
    enum: ["us-east-1", "us-east-2", "us-west-1", "us-west-2", "ap-south-1", "ap-northeast-3", "ap-northeast-2", "ap-southeast-1", "ap-southeast-2", "ap-northeast-1", "ca-central-1", "eu-central-1", "eu-west-1", "eu-west-2", "eu-west-3", "eu-north-1", "sa-east-1", "af-south-1", "ap-east-1", "ap-south-2", "ap-southeast-3", "eu-south-2", "eu-central-2", "me-south-1", "me-central-1"]
    cardinality: 25
  - name: cloud.account.id
    value: "123456789"
  - name: cloud.account.name
    value: sample-account
  - name: aws.billing.currency
    value: "USD"
  - name: aws.billing.ServiceName
    # NOTE: When empty the data refers to estimated charged for the entire account. We cannot reproduce the content (as it's a sum of previous data) but we want to provide the case.
    enum: ["", "AWSCloudTrail", "AWSCodeArtifact", "AWSConfig", "AWSCostExplorer", "AWSDataTransfer", "AWSELB", "AWSLambda", "AWSMarketplace", "AWSQueueService", "AWSSecretsManager", "AWSServiceCatalog", "AWSSystemsManager", "AWSXRay", "AmazonApiGateway", "AmazonCloudWatch", "AmazonCognito", "AmazonDynamoDB", "AmazonEC2", "AmazonECR", "AmazonEKS", "AmazonKinesis", "AmazonKinesisFirehose", "AmazonRDS", "AmazonRedshift", "AmazonRoute53", "AmazonS3", "AmazonSNS", "AmazonVPC", "awskms"]
  - name: agent.id
    value: "12f376ef-5186-4e8b-a175-70f1140a8f30"
  - name: agent.ephemeral_id
    value: "5fd278ce-2a12-4a09-a125-0c5b39aa69e3"
  - name: agent.name
    value: "host.local"
  - name: metricset.period
    value: 86400
  - name: aws.billing.group_definition.key
    # NOTE: repeated values are needed to produce 10% cases with "" value
    enum: ["", "AZ", "INSTANCE_TYPE", "SERVICE", "LINKED_ACCOUNT", "AZ", "INSTANCE_TYPE", "SERVICE", "LINKED_ACCOUNT"]

  - name: event.duration
    range:
      min: 1
      max: 1000
  - name: aws.billing.EstimatedCharges
    cardinality: 25
    fuzziness: 0.2
  - name: aws.billing.AmortizedCost.amount
    cardinality: 25
    fuzziness: 0.2
  - name: aws.billing.BlendedCost.amount
    cardinality: 25
    fuzziness: 0.2
  - name: aws.billing.NormalizedUsageAmount.amount
    cardinality: 25
    fuzziness: 0.2
  - name: aws.billing.UnblendedCost.amount
    cardinality: 25
    fuzziness: 0.2
  - name: aws.billing.UsageQuantity.amount
    cardinality: 25
    fuzziness: 0.2
  - name: aws.billing.group_definition.type
    value: "DIMENSION"
  - name: aws.billing.group_by.INSTANCE_TYPE
    enum: ["NoInstanceType", "a1.large", "c5.2xlarge", "c5.xlarge", "c6i.2xlarge", "db.r6g.2xlarge", "db.t2.micro", "dc2.large", "m5.large", "t1.micro", "t2.medium", "t2.micro", "t2.small", "t2.xlarge", "t3.2xlarge", "t3.medium", "t3.xlarge","t3.xlarge"]
  - name: aws.billing.group_by.SERVICE
    enum: ["Amazon Simple Storage Service", "Amazon Elastic Compute Cloud - Compute", "EC2 - Other", "Amazon Kinesis", "Amazon Relational Database Service", "Amazon Elastic Load Balancing", "AmazonCloudWatch", "AWS CloudTrail", "AWS Config", "AWS Key Management Service", "AWS Lambda", "AWS Secrets Manager", "AWS Service Catalog", "Amazon API Gateway", "Amazon DynamoDB", "Amazon EC2 Container Registry (ECR)", "Amazon Elastic Container Service for Kubernetes", "Amazon Kinesis Firehose", "Amazon Redshift", "Amazon Simple Notification Service", "Amazon Simple Queue Service", "Amazon Virtual Private Cloud"]
  - name: path
    cardinality: 25
    formatting_pattern: "/home/{string}/{string}/{string}/{string}"
  - name: hostIP
    cardinality: 25
    formatting_pattern: "{ipv4}:{port}"

fields.yml

- name: timestamp
  type: date
- name: path
  type: keyword
- name: hostIP
  type: keyword
- name: cloud.region
  type: keyword
- name: cloud.account.id
  type: keyword
- name: cloud.account.name
  type: keyword
- name: event.duration
  type: long
- name: metricset.period
  type: long
- name: aws.billing.currency
  type: keyword
- name: aws.billing.EstimatedCharges
  type: float
  # positive
- name: aws.billing.ServiceName
  type: keyword
- name: aws.billing.AmortizedCost.amount
  type: float
  # positive
- name: aws.billing.BlendedCost.amount
  type: float
  # positive
- name: aws.billing.NormalizedUsageAmount.amount
  type: integer
  # positive
- name: aws.billing.UnblendedCost.amount
  type: float
  # positive
- name: aws.billing.UsageQuantity.amount
  type: integer
  # positive
- name: agent.id
  type: keyword
- name: agent.name
  type: keyword
- name: agent.ephemeral_id
  type: keyword
  example: 12f376ef-5186-4e8b-a175-70f1140a8f30
- name: aws.billing.group_definition.key
  type: keyword
- name: aws.billing.start_date
  type: date
- name: aws.billing.group_definition.type
  type: keyword
- name: aws.billing.group_by.INSTANCE_TYPE
  type: keyword
- name: aws.billing.group_by.SERVICE
  type: keyword

gotext.tpl

{{- $currency := generate "aws.billing.currency" }}
{{- $groupBy := generate "aws.billing.group_definition.key" }}
{{- $period := generate "metricset.period" }}
{{- $cloudId := generate "cloud.account.id" }}
{{- $cloudRegion := generate "cloud.region" }}
{{- $timestamp := generate "timestamp" }}
{
    "@timestamp": "{{$timestamp.Format "2006-01-02T15:04:05.999999Z07:00"}}",
    "cloud": {
        "provider": "aws",
        "region": "{{$cloudRegion}}",
        "account": {
            "id": "{{$cloudId}}",
            "name": "{{generate "cloud.account.name"}}"
        }
    },
    "event": {
        "dataset": "aws.billing",
        "module": "aws",
        "duration": {{generate "event.duration"}}
    },
    "metricset": {
        "name": "billing",
        "period": {{$period}}
    },
    "ecs": {
        "version": "8.2.0"
    },
    "aws": {
        "billing": {
{{- if eq $groupBy "" }}
            "Currency": "{{$currency}}",
            "EstimatedCharges": {{generate "aws.billing.EstimatedCharges"}},
            "ServiceName": "{{generate "aws.billing.ServiceName"}}"
{{- else }}
{{- $sd := generate "aws.billing.start_date" }}
            "start_date": "{{ $sd.Format "2006-01-02T15:04:05.999999Z07:00" }}",
            "end_date": "{{ $sd | date_modify (print "+" $period "s") | date "2006-01-02T15:04:05.999999Z07:00" }}",
            "AmortizedCost": {
                "amount": {{printf "%.2f" (generate "aws.billing.AmortizedCost.amount")}},
                "unit": "{{$currency}}"
            },
            "BlendedCost": {
                "amount": {{printf "%.2f" (generate "aws.billing.BlendedCost.amount")}},
                "unit": "{{$currency}}"
            },
            "NormalizedUsageAmount": {
                "amount": {{generate "aws.billing.NormalizedUsageAmount.amount"}},
                "unit": "N/A"
            },
            "UnblendedCost": {
                "amount": {{printf "%.2f" (generate "aws.billing.UnblendedCost.amount")}},
                "unit": "{{$currency}}"
            },
            "UsageQuantity": {
                "amount": {{generate "aws.billing.UsageQuantity.amount"}},
                "unit": "N/A"
            },
            "group_definition": {
              "key": "{{$groupBy}}",
              "type": "{{generate "aws.billing.group_definition.type"}}"
            },
            "path": "{{generate "path"}}",
            "hostIP": "{{generate "hostIP"}}",
            "group_by": {
{{- if eq $groupBy "AZ"}}
              "AZ": "{{awsAZFromRegion $cloudRegion}}"
{{- else if eq $groupBy "INSTANCE_TYPE"}}
              "INSTANCE_TYPE": "{{generate "aws.billing.group_by.INSTANCE_TYPE"}}"
{{- else if eq $groupBy "SERVICE"}}
              "SERVICE": "{{generate "aws.billing.group_by.SERVICE"}}"
{{- else if eq $groupBy "LINKED_ACCOUNT"}}
              "LINKED_ACCOUNT": "{{$cloudId}}"
{{- end}}
            }
{{- end}}
        }
    },
    "service": {
        "type": "aws"
    },
    "agent": {
        "id": "{{generate "agent.id"}}",
        "name": "{{generate "agent.name"}}",
        "type": "metricbeat",
        "version": "8.0.0",
        "ephemeral_id": "{{generate "agent.ephemeral_id"}}"
    }
}

go run main.go generate-with-template ./gotext.tpl ./fields.yml --config-file ./configs.yml --tot-events 10

Relates: #141

@gpop63 gpop63 force-pushed the formatting_pattern branch 2 times, most recently from 72495ad to 9ce2671 Compare June 14, 2024 16:08
@gpop63 gpop63 force-pushed the formatting_pattern branch from 9ce2671 to 8efeddc Compare June 14, 2024 16:09
@gpop63 gpop63 marked this pull request as ready for review June 17, 2024 17:00
@gpop63 gpop63 requested a review from a team as a code owner June 17, 2024 17:00
@ali786XI
Copy link
Contributor

@gpop63 This is good that we are able to add the formatting_pattern for IPs. However, I faced an issue when I had the type of the hostIP as ip instead of keyword. For keyword we are able to generate a field as {hostIP}:{port} but in case if there is only ip field then it is again should be mentioned as keyword. Is this something expected that we need to specify the dev whosoever is writing the templates?

@gpop63
Copy link
Contributor Author

gpop63 commented Jun 21, 2024

@aliabbas-elastic Currently I only added it for keyword type but we can add it for ip type as well. We would need some additional checks to only allow {ipv4}, {ipv6} and {port} pattern generators for ip. WDYT?

@ali786XI
Copy link
Contributor

@aliabbas-elastic Currently I only added it for keyword type but we can add it for ip type as well. We would need some additional checks to only allow {ipv4}, {ipv6} and {port} pattern generators for ip. WDYT?

@gpop63 Ok. I think it's fine for now as we are able to generate the required formats.

Copy link
Contributor

@ali786XI ali786XI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@shmsr
Copy link
Member

shmsr commented Jul 1, 2024

@gpop63 @aliabbas-elastic

What do you think of this?

- name: hostIP
  cardinality: 25
  formatting_pattern: "{ipv4}:{port}|{ipv4}|{ipv6}"

This is just an example config. Suppose hostIP is of type ip (i understand that it is for keyword type here, but the flexibility will be nice) and by definition, it can support IPv4/ IPv6 addresses.

Why not have a split operator like the one I showed in the config? Gives us more flexibility. Also, there could be cases where the IP is there but not the port, as well as the IP with a port. There could be more such cases for different types. Given that values for some types could be very dynamic, shouldn't be nice if we do this?

The code will look something like this:

func replacePattern(pattern string) (string, error) {
    options := strings.Split(pattern, "|")
    chosenOption := options[rand.Intn(len(options))]
    
    // Define a map of placeholder replacements
    replacements := map[string]func() string{
        "{ipv4}": func() string {
            // logic
        },
        "{ipv6}": func() string {
            // logic
        },
        "{port}": // logic
        "{hostname}": // logic
    }
    
    // Replace each placeholder in the chosen option
    for placeholder, replacementFunc := range replacements {
        if strings.Contains(chosenOption, placeholder) {
            chosenOption = strings.Replace(chosenOption, placeholder, replacementFunc(), -1)
        }
    }
    
    return chosenOption, nil
}

chosenOption randomly chooses one of them and then the rest. Also, we do not need to depend on regexp as the patterns are simple. We can just do string matching.

@shmsr shmsr self-requested a review July 1, 2024 19:40
@shmsr shmsr added the enhancement New feature or request label Jul 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants