Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nondeterministic results with multiple outputs causing unreliable tests #150

Closed
jgough opened this issue Oct 7, 2021 · 20 comments · Fixed by #158 or #173
Closed

Nondeterministic results with multiple outputs causing unreliable tests #150

jgough opened this issue Oct 7, 2021 · 20 comments · Fixed by #158 or #173
Labels

Comments

@jgough
Copy link

jgough commented Oct 7, 2021

If multiple outputs are used in the pipeline config then outputs from Logstash can rearrange themselves. This means tests can either pass or fail randomly as the output from Logstash is nondeterministic.

For example, the following pipeline and test can pass on one execution of LFV, but then fail on the next:

Pipeline:

input {
    stdin {
        id => "input"
    }
}

filter {
    clone {
        clones => [ "cloned_event" ]
    }
}

output {
    if ([type] == "cloned_event") {
        stdout { }
    }

    if ([type] == "original_event") {
        stdout { }
    }
}

Test:

input_plugin: "input"
fields:
  type: original_event
ignore:
  - "@timestamp"
  - "message"
testcases:
  - input:
      - >
        Test message
    expected:
      - type: original_event
      - type: cloned_event

What happens here is that sometimes Logstash outputs original_event before cloned_event, and sometimes cloned_event before original_event

This may be related to #89, but that ticket is mostly about outputs being rearranged after config changes. This is simply about nondeterministic outputs without config changes.

Below is a Dockerfile that illustrates this issue. It sets up a pipeline with the config and test as above, then runs the same test 10 times.

# syntax=docker/dockerfile:1.3-labs
FROM docker.elastic.co/logstash/logstash:7.10.2

ENV LOGSTASH_FILTER_VERIFIER_VERSION v2.0.0-beta.2

USER root
RUN yum clean expire-cache && yum update -y && yum install curl && yum clean all
ADD https://github.com/magnusbaeck/logstash-filter-verifier/releases/download/${LOGSTASH_FILTER_VERIFIER_VERSION}/logstash-filter-verifier_${LOGSTASH_FILTER_VERIFIER_VERSION}_linux_386.tar.gz /opt/
RUN tar xvzf /opt/logstash-filter-verifier_${LOGSTASH_FILTER_VERIFIER_VERSION}_linux_386.tar.gz -C /opt \
    && mv /opt/logstash-filter-verifier /usr/bin/

USER logstash
RUN <<EOF
mkdir tests
mkdir pipeline/pipeline1
mkdir pipeline/pipeline2

cat <<EOT > /usr/share/logstash/config/pipelines.yml
- pipeline.id: pipeline
  path.config: "pipeline/pipeline.conf"
EOT

cat <<EOT > /usr/share/logstash/tests/test.yml
input_plugin: "input"
fields:
  type: original_event
ignore:
  - "@timestamp"
  - "message"
testcases:
  - input:
      - >
        Test message
    expected:
      - type: original_event
      - type: cloned_event
EOT

cat <<EOT > /usr/share/logstash/pipeline/pipeline.conf
input {
    stdin {
        id => "input"
    }
}

filter {
    clone {
        clones => [ "cloned_event" ]
    }
}

output {
    if ([type] == "cloned_event") {
        stdout { }
    }

    if ([type] == "original_event") {
        stdout { }
    }
}
EOT

cat <<EOT > /usr/share/logstash/run_tests.sh
echo "Starting daemon..."
logstash-filter-verifier daemon start &
sleep 5
echo "Running tests 10 times..."
for i in {1..10}
do
  logstash-filter-verifier daemon run --pipeline /usr/share/logstash/config/pipelines.yml --pipeline-base /usr/share/logstash/ --testcase-dir /usr/share/logstash/tests/test.yml --add-missing-id
done
EOT

chmod a+x run_tests.sh
EOF

CMD ["/bin/bash", "/usr/share/logstash/run_tests.sh"]

Build and run with

DOCKER_BUILDKIT=1 docker build --tag test .
docker run --rm test

Sample output below. Note that sometimes the tests fail and sometimes they pass.

Starting daemon...
Running tests 10 times...
☐ Comparing message 1 of 2 from test.yml:
--- /tmp/344674687/test.yml/1/expected  2021-10-07 14:33:54.375837215 +0000
+++ /tmp/344674687/test.yml/1/actual    2021-10-07 14:33:54.375837215 +0000
@@ -1,3 +1,3 @@
 {
-  "type": "original_event"
+  "type": "cloned_event"
 }

☐ Comparing message 2 of 2 from test.yml:
--- /tmp/344674687/test.yml/2/expected  2021-10-07 14:33:54.379837558 +0000
+++ /tmp/344674687/test.yml/2/actual    2021-10-07 14:33:54.379837558 +0000
@@ -1,3 +1,3 @@
 {
-  "type": "cloned_event"
+  "type": "original_event"
 }


Summary: ☐ All tests: 0/2
         ☐ test.yml: 0/2
logstash-filter-verifier: error: failed test cases
☑ Comparing message 1 of 2 from test.yml
☑ Comparing message 2 of 2 from test.yml

Summary: ☑ All tests: 2/2
         ☑ test.yml: 2/2
☑ Comparing message 1 of 2 from test.yml
☑ Comparing message 2 of 2 from test.yml

Summary: ☑ All tests: 2/2
         ☑ test.yml: 2/2
☐ Comparing message 1 of 2 from test.yml:
--- /tmp/664392467/test.yml/1/expected  2021-10-07 14:34:04.492705039 +0000
+++ /tmp/664392467/test.yml/1/actual    2021-10-07 14:34:04.492705039 +0000
@@ -1,3 +1,3 @@
 {
-  "type": "original_event"
+  "type": "cloned_event"
 }

☐ Comparing message 2 of 2 from test.yml:
--- /tmp/664392467/test.yml/2/expected  2021-10-07 14:34:04.496705382 +0000
+++ /tmp/664392467/test.yml/2/actual    2021-10-07 14:34:04.496705382 +0000
@@ -1,3 +1,3 @@
 {
-  "type": "cloned_event"
+  "type": "original_event"
 }


Summary: ☐ All tests: 0/2
         ☐ test.yml: 0/2
logstash-filter-verifier: error: failed test cases
☐ Comparing message 1 of 2 from test.yml:
--- /tmp/176529619/test.yml/1/expected  2021-10-07 14:34:07.636974754 +0000
+++ /tmp/176529619/test.yml/1/actual    2021-10-07 14:34:07.636974754 +0000
@@ -1,3 +1,3 @@
 {
-  "type": "original_event"
+  "type": "cloned_event"
 }

☐ Comparing message 2 of 2 from test.yml:
--- /tmp/176529619/test.yml/2/expected  2021-10-07 14:34:07.640975098 +0000
+++ /tmp/176529619/test.yml/2/actual    2021-10-07 14:34:07.640975098 +0000
@@ -1,3 +1,3 @@
 {
-  "type": "cloned_event"
+  "type": "original_event"
 }


Summary: ☐ All tests: 0/2
         ☐ test.yml: 0/2
logstash-filter-verifier: error: failed test cases
☑ Comparing message 1 of 2 from test.yml
☑ Comparing message 2 of 2 from test.yml

Summary: ☑ All tests: 2/2
         ☑ test.yml: 2/2
☑ Comparing message 1 of 2 from test.yml
☑ Comparing message 2 of 2 from test.yml

Summary: ☑ All tests: 2/2
         ☑ test.yml: 2/2
☑ Comparing message 1 of 2 from test.yml
☑ Comparing message 2 of 2 from test.yml

Summary: ☑ All tests: 2/2
         ☑ test.yml: 2/2
☑ Comparing message 1 of 2 from test.yml
☑ Comparing message 2 of 2 from test.yml

Summary: ☑ All tests: 2/2
         ☑ test.yml: 2/2
☑ Comparing message 1 of 2 from test.yml
☑ Comparing message 2 of 2 from test.yml

Summary: ☑ All tests: 2/2
         ☑ test.yml: 2/2

@nicolasreich
Copy link

nicolasreich commented Oct 13, 2021

I am encountering the same issue but I wasn't really able to create a proper reproduction scenario, thanks @jgough for the amazing reproducing steps! And I'd like to add that it is not linked to the use of the clone filter but exclusively to the use of multiple outputs, and also happens between test cases. To illustrate, with a test case containing this (among other things):

...
  - expected:
      - EventID: "4765"
    input:
      - >
          {
           "SourceUserName": null,
           "SourceSid": null,
           "TargetUserName": null,
           "TargetDomainName": null,
           "TargetSid": null,
           "SubjectUserSid": null,
           "SubjectUserName": null,
           "SubjectDomainName": null,
           "SubjectLogonId": null,
           "PrivilegeList": null,
           "SidList": null,
           "log_name": "security",
           "event_id": "4765"
          }
  - expected:
      - timestamp: "toto"
        message: "titi sudo"
        log_category: "srv-unix"
    input:
      - >
          {
           "timestamp": "toto",
           "message": "titi sudo",
           "log_category": "srv-unix",
           "drop_this_field": "gigi"
          }

and a filter containing multiple outputs, I get this LFV output:

☐ Comparing message 1 of 2 from splunk_output.test.yaml:                                                                                                                                      
--- /tmp/880506628/splunk_output.test.yaml/1/expected   2021-10-13 14:32:13.643438839 +0200                                                                                                   
+++ /tmp/880506628/splunk_output.test.yaml/1/actual     2021-10-13 14:32:13.643438839 +0200                                                                                                   
@@ -1,3 +1,5 @@                                                                                                                                                                               
 {
-  "EventID": "4765"
+  "log_category": "srv-unix",
+  "message": "titi sudo",
+  "timestamp": "toto"
 }

☐ Comparing message 2 of 2 from splunk_output.test.yaml:
--- /tmp/880506628/splunk_output.test.yaml/2/expected   2021-10-13 14:32:13.643438839 +0200
+++ /tmp/880506628/splunk_output.test.yaml/2/actual     2021-10-13 14:32:13.643438839 +0200
@@ -1,5 +1,3 @@
 {
-  "log_category": "srv-unix",
-  "message": "titi sudo",
-  "timestamp": "toto"
+  "EventID": "4765"
 }


Summary: ☐ All tests: 0/2
         ☐ splunk_output.test.yaml: 0/2
logstash-filter-verifier: error: failed test cases

@jgough
Copy link
Author

jgough commented Oct 13, 2021

@nicolasreich Great to hear that I'm not the only one encountering this issue! Your issue sounds more like #89, but I think there are some major interlinked issues where multiple outputs affect the order. I tried to fix this in #90 but my patch had some major issues I discovered. I don't know the code well enough to propose a working solution to this.

@nicolasreich
Copy link

@jgough actually it's linked to #153, there are multiple outputs but also prunes in that particular filter, adding __lfv_id to the whitelist makes the testsuite behave nicely, thanks a lot for the find!

@breml
Copy link
Collaborator

breml commented Oct 18, 2021

Just to clarify, is this issue the same as #153, because in all failing scenarios there has been a filter that modifies some of the __lfv... fields (like prune or clone) or are there situations, where the ordering fails without such filters?

@nicolasreich
Copy link

I haven't encountered situations where it fails without prune or clone

@breml breml added the bug label Oct 22, 2021
@jgough
Copy link
Author

jgough commented Oct 25, 2021

I'm guessing that after a clone filter both events have exactly the same __lfv_ fields and the ordering between the two is not well defined, which results in this nondeterministic output. I think a fix would have to sort multiple outputs and try to work out which to compare to which. It's what I tried in #90 but failed miserably.

I believe #153 is to do when the __lfv fields are completely removed by a plugin (such as prune). I think they are related, but not quite the same.

@breml
Copy link
Collaborator

breml commented Oct 25, 2021

@jgough I think, this problem might be solved with #157 as well, because #157 ensures, that __lfv_id as well as __lfv_out_passed are exported. This is important, because these fields are used to sort the events consistently as you can see in https://github.com/magnusbaeck/logstash-filter-verifier/blob/master/internal/app/daemon/run/run.go#L271-L278. First the events are sorted by __lfv_id. If two events have the same __lfv_id, they are then sorted by __lfv_out_passed.

I will add your testcase to the integration tests.

breml added a commit to breml/logstash-filter-verifier that referenced this issue Oct 25, 2021
breml added a commit to breml/logstash-filter-verifier that referenced this issue Oct 26, 2021
breml added a commit that referenced this issue Oct 26, 2021
@jgough
Copy link
Author

jgough commented Dec 7, 2021

I'm still seeing this issue in trunk.

I've modified the top of the Dockerfile to build LFV from trunk and you can use this to reproduce:

# syntax=docker/dockerfile:1.3-labs
FROM golang:1.17-buster as lfv_builder

RUN git clone https://github.com/magnusbaeck/logstash-filter-verifier
RUN apt update && apt install -y protobuf-compiler
RUN useradd -m -s /bin/bash go && chown -R go /go

USER go
WORKDIR /go/logstash-filter-verifier
RUN make && make check test

FROM docker.elastic.co/logstash/logstash:7.15.2

COPY --from=lfv_builder /go/logstash-filter-verifier /usr/bin

RUN <<EOF
mkdir tests
mkdir pipeline/pipeline1
mkdir pipeline/pipeline2

cat <<EOT > /usr/share/logstash/config/pipelines.yml
- pipeline.id: pipeline
  path.config: "pipeline/pipeline.conf"
EOT

cat <<EOT > /usr/share/logstash/tests/test.yml
input_plugin: "input"
fields:
  type: original_event
ignore:
  - "@timestamp"
  - "message"
testcases:
  - input:
      - >
        Test message
    expected:
      - type: original_event
      - type: cloned_event
EOT

cat <<EOT > /usr/share/logstash/pipeline/pipeline.conf
input {
    stdin {
        id => "input"
    }
}

filter {
    clone {
        clones => [ "cloned_event" ]
    }
}

output {
    if ([type] == "cloned_event") {
        stdout { }
    }

    if ([type] == "original_event") {
        stdout { }
    }
}
EOT

cat <<EOT > /usr/share/logstash/run_tests.sh
echo "Starting daemon..."
logstash-filter-verifier daemon start &
sleep 5
echo "Running tests 10 times..."
for i in {1..10}
do
  logstash-filter-verifier daemon run --pipeline /usr/share/logstash/config/pipelines.yml --pipeline-base /usr/share/logstash/ --testcase-dir /usr/share/logstash/tests/test.yml --add-missing-id
done
EOT

chmod a+x run_tests.sh
EOF

CMD ["/bin/bash", "/usr/share/logstash/run_tests.sh"]

The output is the same, sometimes passing and sometimes failing:

Starting daemon...
Waiting for /tmp/lfv-2769872727/logstash-instance/1sAb7I03/logstash.log to appear...
Daemon listening on /tmp/logstash-filter-verifier.sock
Running tests 10 times...
Ready to process tests
☑ Comparing message 1 of 2 from test.yml
☑ Comparing message 2 of 2 from test.yml

Summary: ☑ All tests: 2/2
         ☑ test.yml: 2/2
☐ Comparing message 1 of 2 from test.yml:
--- /tmp/3793316903/test.yml/1/expected 2021-12-07 15:15:05.619219743 +0000
+++ /tmp/3793316903/test.yml/1/actual   2021-12-07 15:15:05.619219743 +0000
@@ -1,3 +1,3 @@
 {
-  "type": "original_event"
+  "type": "cloned_event"
 }

☐ Comparing message 2 of 2 from test.yml:
--- /tmp/3793316903/test.yml/2/expected 2021-12-07 15:15:05.627220438 +0000
+++ /tmp/3793316903/test.yml/2/actual   2021-12-07 15:15:05.627220438 +0000
@@ -1,3 +1,3 @@
 {
-  "type": "cloned_event"
+  "type": "original_event"
 }


Summary: ☐ All tests: 0/2
         ☐ test.yml: 0/2
logstash-filter-verifier: error: failed test cases
☐ Comparing message 1 of 2 from test.yml:
--- /tmp/3506621347/test.yml/1/expected 2021-12-07 15:15:09.143526012 +0000
+++ /tmp/3506621347/test.yml/1/actual   2021-12-07 15:15:09.143526012 +0000
@@ -1,3 +1,3 @@
 {
-  "type": "original_event"
+  "type": "cloned_event"
 }

☐ Comparing message 2 of 2 from test.yml:
--- /tmp/3506621347/test.yml/2/expected 2021-12-07 15:15:09.147526360 +0000
+++ /tmp/3506621347/test.yml/2/actual   2021-12-07 15:15:09.147526360 +0000
@@ -1,3 +1,3 @@
 {
-  "type": "cloned_event"
+  "type": "original_event"
 }


Summary: ☐ All tests: 0/2
         ☐ test.yml: 0/2
logstash-filter-verifier: error: failed test cases
☑ Comparing message 1 of 2 from test.yml
☑ Comparing message 2 of 2 from test.yml

Summary: ☑ All tests: 2/2
         ☑ test.yml: 2/2
☑ Comparing message 1 of 2 from test.yml
☑ Comparing message 2 of 2 from test.yml

Summary: ☑ All tests: 2/2
         ☑ test.yml: 2/2
☐ Comparing message 1 of 2 from test.yml:
--- /tmp/2716320057/test.yml/1/expected 2021-12-07 15:15:18.772362776 +0000
+++ /tmp/2716320057/test.yml/1/actual   2021-12-07 15:15:18.772362776 +0000
@@ -1,3 +1,3 @@
 {
-  "type": "original_event"
+  "type": "cloned_event"
 }

☐ Comparing message 2 of 2 from test.yml:
--- /tmp/2716320057/test.yml/2/expected 2021-12-07 15:15:18.772362776 +0000
+++ /tmp/2716320057/test.yml/2/actual   2021-12-07 15:15:18.772362776 +0000
@@ -1,3 +1,3 @@
 {
-  "type": "cloned_event"
+  "type": "original_event"
 }


Summary: ☐ All tests: 0/2
         ☐ test.yml: 0/2
logstash-filter-verifier: error: failed test cases
☐ Comparing message 1 of 2 from test.yml:
--- /tmp/3500828995/test.yml/1/expected 2021-12-07 15:15:22.140655487 +0000
+++ /tmp/3500828995/test.yml/1/actual   2021-12-07 15:15:22.140655487 +0000
@@ -1,3 +1,3 @@
 {
-  "type": "original_event"
+  "type": "cloned_event"
 }

☐ Comparing message 2 of 2 from test.yml:
--- /tmp/3500828995/test.yml/2/expected 2021-12-07 15:15:22.144655835 +0000
+++ /tmp/3500828995/test.yml/2/actual   2021-12-07 15:15:22.144655835 +0000
@@ -1,3 +1,3 @@
 {
-  "type": "cloned_event"
+  "type": "original_event"
 }


Summary: ☐ All tests: 0/2
         ☐ test.yml: 0/2
logstash-filter-verifier: error: failed test cases
☐ Comparing message 1 of 2 from test.yml:
--- /tmp/1476337973/test.yml/1/expected 2021-12-07 15:15:25.512948546 +0000
+++ /tmp/1476337973/test.yml/1/actual   2021-12-07 15:15:25.512948546 +0000
@@ -1,3 +1,3 @@
 {
-  "type": "original_event"
+  "type": "cloned_event"
 }

☐ Comparing message 2 of 2 from test.yml:
--- /tmp/1476337973/test.yml/2/expected 2021-12-07 15:15:25.512948546 +0000
+++ /tmp/1476337973/test.yml/2/actual   2021-12-07 15:15:25.512948546 +0000
@@ -1,3 +1,3 @@
 {
-  "type": "cloned_event"
+  "type": "original_event"
 }


Summary: ☐ All tests: 0/2
         ☐ test.yml: 0/2
logstash-filter-verifier: error: failed test cases
☐ Comparing message 1 of 2 from test.yml:
--- /tmp/919424890/test.yml/1/expected  2021-12-07 15:15:28.769231523 +0000
+++ /tmp/919424890/test.yml/1/actual    2021-12-07 15:15:28.769231523 +0000
@@ -1,3 +1,3 @@
 {
-  "type": "original_event"
+  "type": "cloned_event"
 }

☐ Comparing message 2 of 2 from test.yml:
--- /tmp/919424890/test.yml/2/expected  2021-12-07 15:15:28.769231523 +0000
+++ /tmp/919424890/test.yml/2/actual    2021-12-07 15:15:28.769231523 +0000
@@ -1,3 +1,3 @@
 {
-  "type": "cloned_event"
+  "type": "original_event"
 }


Summary: ☐ All tests: 0/2
         ☐ test.yml: 0/2
logstash-filter-verifier: error: failed test cases
☑ Comparing message 1 of 2 from test.yml
☑ Comparing message 2 of 2 from test.yml

Summary: ☑ All tests: 2/2
         ☑ test.yml: 2/2

@jgough
Copy link
Author

jgough commented Dec 13, 2021

Do I need to do anything to get this reopened or should I file this as another issue?

@breml breml reopened this Dec 13, 2021
@breml
Copy link
Collaborator

breml commented Dec 13, 2021

From what I can tell, we already did everything possible to ensure a persistent order of the events, but this is still does not yet guarantee the correct order. The problem is, that both events (original_event, cloned_event) have the same LFV ID, which is used as the first key to order the events. In order to make the ordering (more) consistent, I have the following proposals:

  1. Form a hash (e.g. sha1) from the whole event received and use this hash as the second key (after the LFV ID) for the ordering.
  2. Allow the user to configure the additional fields, that should be used to order events with the same LFV ID (in the above example type would be used as second key).
  3. Match all events with the same ID against the next expected events and continue with the next event if a match is found.

I assume, this the above order would also be the order in regards to the necessary effort for the implementation (from low to higher).

@jgough What do you think?

@jgough
Copy link
Author

jgough commented Dec 13, 2021

Thanks for reopening. The issue here is the two output plugins. With one output plugin this issue does not occur so I'm guessing this is something deep in the internals of logstash with both outputs running in parallel.

I can think of two solutions:

  1. Ensure outputted event ordering is stable (your options 1 and 2?)
  2. Ignore output order completely (your option 3?)

Can you think of logstash use-cases where the output order is significant with one single input and multiple outputs? In my mind, order of events outputted should not really matter. The order that events are outputted is just a side-effect of the logstash pipeline, and not something to be tested. What matters more is the content of the output. If two events are generated, and they match the two events expected (in any order) then that is a success.

I don't really view expected as an ordered list, more as a set, so if all events are generated then it's a pass, no matter what order they are in. This may simplify cases that pass but complicate cases that fail as you don't quite know which events to compare to which. My original idea was simply to sort the events by the JSON string representation and compare like that, otherwise you get into some horrible heuristics. Anyone dealing with this kind of failure should be slightly confused, but still able to see what has happened.

Is that at all helpful?

@breml
Copy link
Collaborator

breml commented Dec 13, 2021

Thanks for reopening. The issue here is the two output plugins. With one output plugin this issue does not occur so I'm guessing this is something deep in the internals of logstash with both outputs running in parallel.

I agree, the two outputs are executed concurrently and those the order of the events is not guaranteed nor can it be controlled.

I can think of two solutions:

  1. Ensure outputted event ordering is stable (your options 1 and 2?)
  2. Ignore output order completely (your option 3?)

Agree.

Can you think of logstash use-cases where the output order is significant with one single input and multiple outputs? In my mind, order of events outputted should not really matter. The order that events are outputted is just a side-effect of the logstash pipeline, and not something to be tested. What matters more is the content of the output. If two events are generated, and they match the two events expected (in any order) then that is a success.

I don't really view expected as an ordered list, more as a set, so if all events are generated then it's a pass, no matter what order they are in. This may simplify cases that pass but complicate cases that fail as you don't quite know which events to compare to which. My original idea was simply to sort the events by the JSON string representation and compare like that, otherwise you get into some horrible heuristics. Anyone dealing with this kind of failure should be slightly confused, but still able to see what has happened.

Now that I think again about your last test case, I think I know the reason, why it is not deterministic. Can you please run the tests again with export_outputs: true and with both outputs having an id set. Because I think, we already hit this case and there is code to order the received events first by LFV ID and then by the ID of the output the event passed through.

Is that at all helpful?

Yes, thanks.

@jgough
Copy link
Author

jgough commented Dec 13, 2021

I can verify that explicitly setting ids on the outputs does indeed fix this issue and the order becomes deterministic. Thanks!

@breml
Copy link
Collaborator

breml commented Dec 13, 2021

@jgough Do you agree, that in this case, this is more of a documentation issue than a bug in LFV.

@jgough
Copy link
Author

jgough commented Dec 13, 2021

Not entirely sure here. Is this happening because IDs added from --add-missing-id are random strings and different on each execution? I'm not sure that the behaviour is completely obvious, even if documented.

Easy fix would be to document it and remind people to always add IDs to outputs if they are using multiple.
Maybe a better fix would be to make the IDs deterministic (SHA1 of the plugin + Line number??) - if that is the issue here.

What do you feel is best here?

breml added a commit to breml/logstash-filter-verifier that referenced this issue Dec 13, 2021
breml added a commit to breml/logstash-filter-verifier that referenced this issue Dec 13, 2021
@breml
Copy link
Collaborator

breml commented Dec 13, 2021

@jgough Your last comment finally gave me the necessary hint to check again the code, where I discovered an other bug. It was not the intention, that the ID of the output plugins change randomly, but this was what actually happened. The reason is, that in https://github.com/magnusbaeck/logstash-filter-verifier/blob/master/internal/daemon/logstashconfig/file.go#L138 I generate a random ID, as a last resort, if an output does not have an ID. But one should never end up there, because with --add-missing-id, some ID are already generated, which in fact just use a counter for the ID and therefore should lead to deterministic ordering. The problem was, that the modified config (with the generated ID) did never end up in the config, that is used for the tests.

This should now be fixed with PR #173.

@jgough
Copy link
Author

jgough commented Dec 13, 2021

Verified with your branch that the test above is now deterministic. Thanks for the explanation and the fix!

Will this ordering also likely be stable under config changes? i.e. if someone adds another filter to their config (which may change the numbering for the ID counter), then might the ordering change then? Is it sorted alphabetically? If so then two outputs (id_missing_8, id_missing_9) may change to (id_missing_9, id_missing_10) and then the alphabetical ordering changes.

This could potentially result in someone having to swap the ordering in the entire test suite if the ordering might suddenly change. If this is the case is it worth zero padding these ids?

breml added a commit that referenced this issue Dec 14, 2021
@breml
Copy link
Collaborator

breml commented Dec 14, 2021

So first of all, omitting the ID is not encouraged with LFV v2 and the flag --add-missing-id in my opinion is only a workaround for the transition phase. That being said, for a single test, I think the ordering will be fine in most cases, when only new outputs are added, because the relative ordering of the outputs will stay the same (they are ordered alphabetically, that is why I added %0000d to have some leading zeros there). The test cases will need to change, when the two relevant outputs swap their position, e.g. because the position of the outputs in a file are swapped or if e.g. files are renamed or moved to different folders).

@jgough
Copy link
Author

jgough commented Dec 14, 2021

Don't we want %04d for leading zeros?

>> fmt.Printf("%0000d, %04d", 12, 12)
12, 0012

@breml
Copy link
Collaborator

breml commented Dec 14, 2021

Fixed it in #174

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants