Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lfv_fields_not_found error when using codec that outputs multiple events #147

Open
jgough opened this issue Sep 24, 2021 · 6 comments · May be fixed by #156
Open

lfv_fields_not_found error when using codec that outputs multiple events #147

jgough opened this issue Sep 24, 2021 · 6 comments · May be fixed by #156
Labels

Comments

@jgough
Copy link

jgough commented Sep 24, 2021

When building from source from master, when a codec is specified that outputs multiple events then the fields.json file that is generated may not have enough fields, resulting in a lfv_fields_not_found tag.

For example with the following input:

  stdin {
    id => "input"
    codec => multiline {
      pattern => "^\s"
      what => "previous"
    }
  }

and the following test:

input_plugin: "input"
fields:
  test: true
ignore:
  - "@timestamp"
testcases:
  - input:
    - |
      This is message 1 and
        this is message 1 also
      This is message 2 and
        this is message 2 also
    expected:
      - message: "This is message 1 and\n  this is message 1 also"
        tags: ["multiline"]
        test: true
      - message: "This is message 2 and\n  this is message 2 also"
        tags: ["multiline"]
        test: true

The following is output:

☑ Comparing message 1 of 2 from test1.yml
☐ Comparing message 2 of 2 from test1.yml:
--- /tmp/1371322176/test1.yml/2/expected        2021-09-24 16:59:01.007682196 +0000
+++ /tmp/1371322176/test1.yml/2/actual  2021-09-24 16:59:01.007682196 +0000
@@ -1,7 +1,7 @@
 {
   "message": "This is message 2 and\n  this is message 2 also",
   "tags": [
-    "multiline"
-  ],
-  "test": true
+    "multiline",
+    "lfv_fields_not_found"
+  ]
 }


Summary: ☐ All tests: 1/2
         ☐ test1.yml: 1/2
logstash-filter-verifier: error: failed test cases

This seems to be because fields.json is set to contain {"0":{"test":true}} and there is no second value for the second message. I know LFV has no concept that there are two events here. Is there any solution to this problem?

You could argue that this should be two different testcase input values, but given logstash produced two outputs for one input here it could be argued that this should be supported. What are your thoughts?

Dockerfile to repro:

# syntax=docker/dockerfile:1.3-labs
FROM golang:1.17 as lfv_builder

RUN git clone https://github.com/magnusbaeck/logstash-filter-verifier.git
RUN apt update && apt install -y protobuf-compiler
RUN useradd -m -s /bin/bash go && chown -R go /go

USER go
WORKDIR /go/logstash-filter-verifier
RUN make && make check test

#------------------------------------------------------#

FROM docker.elastic.co/logstash/logstash:7.10.2

COPY --from=lfv_builder /go/logstash-filter-verifier /usr/bin

RUN <<EOF
mkdir tests
mkdir pipeline/pipeline1

cat <<EOT > /usr/share/logstash/config/pipelines.yml
- pipeline.id: pipeline1
  path.config: "pipeline/pipeline1/*.conf"
EOT

cat <<EOT > /usr/share/logstash/tests/test1.yml
input_plugin: "input"
fields:
  test: true
ignore:
  - "@timestamp"
testcases:
  - input:
    - |
      This is message 1 and
        this is message 1 also
      This is message 2 and
        this is message 2 also
    expected:
      - message: "This is message 1 and\n  this is message 1 also"
        tags: ["multiline"]
        test: true
      - message: "This is message 2 and\n  this is message 2 also"
        tags: ["multiline"]
        test: true
EOT

cat <<EOT > /usr/share/logstash/pipeline/pipeline1/01-input.conf
input {
  stdin {
    id => "input"
    codec => multiline {
      pattern => "^\s"
      what => "previous"
    }
  }
}

output { stdout {} }
EOT

cat <<EOT > /usr/share/logstash/run_tests.sh
logstash-filter-verifier daemon start --no-cleanup &
sleep 5
logstash-filter-verifier daemon run --pipeline /usr/share/logstash/config/pipelines.yml --pipeline-base /usr/share/logstash/ --testcase-dir /usr/share/logstash/tests/test1.yml --add-missing-id
EOT

EOF

CMD ["/bin/bash", "/usr/share/logstash/run_tests.sh"]
@jgough
Copy link
Author

jgough commented Sep 26, 2021

Further thoughts on this - really what I'm trying to test here is that my multiline codec works correctly. By splitting up into multiple tests I'm not really testing this in a way that makes logical sense to me (although it may indeed be equivalent)

@breml
Copy link
Collaborator

breml commented Oct 18, 2021

This might be related to #149. The generator input approach does not work well for cases with codecs as well as with inputs that have "special" characters (from the Logstash config format point of view). As mentioned in #149, we might need to replace the generator input with something else (e.g. the file input in read mode).

@nicolasreich
Copy link

I encountered a similar issue trying to test a configuration looking like this:

input {
  file {
    path => "/var/log/db_logs_logstash/*.csv"
    mode => "read"
    file_completed_action => "delete"
    id => "file"
  }
}

filter {
  csv {
      separator => ";"
      autodetect_column_names => true
      skip_empty_columns => true
      id => csv
  }
}

It worked by setting the input over two lines, i.e. looking like this:

---
ignore:
  - "@timestamp"
testcases:
  - input:
      - "TYP_INST;DBNAME;OS_USERNAME;USERNAME;USERHOST;TERMINAL;LOGON_TIME;LOGOFF_TIME;OS_PROCESS;CLIENT_ID"
      - "Prod;myBeautifulDB;myOS;myUsername;myUserhost;;22/08/2021 14:14:43;22/08/2021 14:14:43;4468;"
    expected:
      - message: "Prod;myBeautifulDB;myOS;myUsername;myUserhost;;22/08/2021 14:14:43;22/08/2021 14:14:43;4468;"
        LOGSTASH_DB_LOGS: "true"
        TYP_INST: "Prod"
        DBNAME: "myBeautifulDB"
        OS_USERNAME: "myOS"
        USERNAME: "myUsername"
        USERHOST: "myUserhost"
        LOGON_TIME: "22/08/2021 14:14:43"
        LOGOFF_TIME: "22/08/2021 14:14:43"
        OS_PROCESS: "4468"
input_plugin: file

So in the example from the initial post, it would be:

input_plugin: "input"
fields:
  test: true
ignore:
  - "@timestamp"
testcases:
  - input:
    - "This is message 1 and"
    - "  this is message 1 also"
    - "This is message 2 and"
    - "  this is message 2 also"
    expected:
      - message: "This is message 1 and\n  this is message 1 also"
        tags: ["multiline"]
        test: true
      - message: "This is message 2 and\n  this is message 2 also"
        tags: ["multiline"]
        test: true

Although I agree it is somewhat counterintuitive.

breml added a commit to breml/logstash-filter-verifier that referenced this issue Oct 22, 2021
breml added a commit to breml/logstash-filter-verifier that referenced this issue Oct 22, 2021
@breml breml added the bug label Oct 22, 2021
breml added a commit to breml/logstash-filter-verifier that referenced this issue Oct 26, 2021
breml added a commit to breml/logstash-filter-verifier that referenced this issue Oct 26, 2021
breml added a commit to breml/logstash-filter-verifier that referenced this issue Nov 1, 2021
breml added a commit to breml/logstash-filter-verifier that referenced this issue Nov 1, 2021
@breml
Copy link
Collaborator

breml commented Nov 1, 2021

x input lines can lead to y input events can lead to z output events

There is no direct relationship between:

  • the number of lines of input
  • the number of events generated at the input (e.g. multiline codec)
  • the number of output events emitted from Logstash (e.g. clone & drop filter)

The number of events emitted from Logstash (z) can be evaluated from the number of expected events. But there is currently not yet a good way to reliably determine the number of events, that will be generated from the input plugin (including the codec). This makes it hard to prepare the correct number of fields, that should be added to the event.

Maybe it is necessary to separate the logic to add the global fields (to the event) from the per test case fields. Additionally it might become necessary to create a separate input for each test case.

Related #142

breml added a commit to breml/logstash-filter-verifier that referenced this issue Dec 20, 2021
breml added a commit to breml/logstash-filter-verifier that referenced this issue Dec 20, 2021
@breml
Copy link
Collaborator

breml commented Dec 20, 2021

In #156 I currently have a version, which handles multiline codec better and also correctly maps the fields to the correct input by having a separate input plugin for each input.

The down-side of this approach is, that for a multiline input, always the following form needs to be used:

  - input:
      - |-
        TYP_INST;DBNAME;OS_USERNAME;USERNAME;USERHOST;TERMINAL;LOGON_TIME;LOGOFF_TIME;OS_PROCESS;CLIENT_ID
        Prod;myBeautifulDB;myOS;myUsername;myUserhost;;22/08/2021 14:14:43;22/08/2021 14:14:43;4468;

this form does no longer work (because the two lines are not processed by the same input and therefore not by the same codec):

  - input:
      - TYP_INST;DBNAME;OS_USERNAME;USERNAME;USERHOST;TERMINAL;LOGON_TIME;LOGOFF_TIME;OS_PROCESS;CLIENT_ID
      - Prod;myBeautifulDB;myOS;myUsername;myUserhost;;22/08/2021 14:14:43;22/08/2021 14:14:43;4468;

As mentioned in the previous comment, there are a lot of different combinations, that work against each other and it sort of looks like there is no solution, where all the combinations work.

@jgough
Copy link
Author

jgough commented Dec 20, 2021

The first format makes more logical sense to me, all those lines being part of one input. Does this mean that in my test above you would have to use:

    - |
      This is message 1 and
        this is message 1 also
    - |
      This is message 2 and
        this is message 2 also

For two inputs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants