lfv_fields_not_found error when using codec that outputs multiple events #147

jgough · 2021-09-24T17:15:11Z

When building from source from master, when a codec is specified that outputs multiple events then the fields.json file that is generated may not have enough fields, resulting in a lfv_fields_not_found tag.

For example with the following input:

  stdin {
    id => "input"
    codec => multiline {
      pattern => "^\s"
      what => "previous"
    }
  }

and the following test:

input_plugin: "input"
fields:
  test: true
ignore:
  - "@timestamp"
testcases:
  - input:
    - |
      This is message 1 and
        this is message 1 also
      This is message 2 and
        this is message 2 also
    expected:
      - message: "This is message 1 and\n  this is message 1 also"
        tags: ["multiline"]
        test: true
      - message: "This is message 2 and\n  this is message 2 also"
        tags: ["multiline"]
        test: true

The following is output:

☑ Comparing message 1 of 2 from test1.yml
☐ Comparing message 2 of 2 from test1.yml:
--- /tmp/1371322176/test1.yml/2/expected        2021-09-24 16:59:01.007682196 +0000
+++ /tmp/1371322176/test1.yml/2/actual  2021-09-24 16:59:01.007682196 +0000
@@ -1,7 +1,7 @@
 {
   "message": "This is message 2 and\n  this is message 2 also",
   "tags": [
-    "multiline"
-  ],
-  "test": true
+    "multiline",
+    "lfv_fields_not_found"
+  ]
 }


Summary: ☐ All tests: 1/2
         ☐ test1.yml: 1/2
logstash-filter-verifier: error: failed test cases

This seems to be because fields.json is set to contain {"0":{"test":true}} and there is no second value for the second message. I know LFV has no concept that there are two events here. Is there any solution to this problem?

You could argue that this should be two different testcase input values, but given logstash produced two outputs for one input here it could be argued that this should be supported. What are your thoughts?

Dockerfile to repro:

# syntax=docker/dockerfile:1.3-labs
FROM golang:1.17 as lfv_builder

RUN git clone https://github.com/magnusbaeck/logstash-filter-verifier.git
RUN apt update && apt install -y protobuf-compiler
RUN useradd -m -s /bin/bash go && chown -R go /go

USER go
WORKDIR /go/logstash-filter-verifier
RUN make && make check test

#------------------------------------------------------#

FROM docker.elastic.co/logstash/logstash:7.10.2

COPY --from=lfv_builder /go/logstash-filter-verifier /usr/bin

RUN <<EOF
mkdir tests
mkdir pipeline/pipeline1

cat <<EOT > /usr/share/logstash/config/pipelines.yml
- pipeline.id: pipeline1
  path.config: "pipeline/pipeline1/*.conf"
EOT

cat <<EOT > /usr/share/logstash/tests/test1.yml
input_plugin: "input"
fields:
  test: true
ignore:
  - "@timestamp"
testcases:
  - input:
    - |
      This is message 1 and
        this is message 1 also
      This is message 2 and
        this is message 2 also
    expected:
      - message: "This is message 1 and\n  this is message 1 also"
        tags: ["multiline"]
        test: true
      - message: "This is message 2 and\n  this is message 2 also"
        tags: ["multiline"]
        test: true
EOT

cat <<EOT > /usr/share/logstash/pipeline/pipeline1/01-input.conf
input {
  stdin {
    id => "input"
    codec => multiline {
      pattern => "^\s"
      what => "previous"
    }
  }
}

output { stdout {} }
EOT

cat <<EOT > /usr/share/logstash/run_tests.sh
logstash-filter-verifier daemon start --no-cleanup &
sleep 5
logstash-filter-verifier daemon run --pipeline /usr/share/logstash/config/pipelines.yml --pipeline-base /usr/share/logstash/ --testcase-dir /usr/share/logstash/tests/test1.yml --add-missing-id
EOT

EOF

CMD ["/bin/bash", "/usr/share/logstash/run_tests.sh"]

The text was updated successfully, but these errors were encountered:

jgough · 2021-09-26T08:53:54Z

Further thoughts on this - really what I'm trying to test here is that my multiline codec works correctly. By splitting up into multiple tests I'm not really testing this in a way that makes logical sense to me (although it may indeed be equivalent)

breml · 2021-10-18T19:24:24Z

This might be related to #149. The generator input approach does not work well for cases with codecs as well as with inputs that have "special" characters (from the Logstash config format point of view). As mentioned in #149, we might need to replace the generator input with something else (e.g. the file input in read mode).

nicolasreich · 2021-10-22T13:24:41Z

I encountered a similar issue trying to test a configuration looking like this:

input {
  file {
    path => "/var/log/db_logs_logstash/*.csv"
    mode => "read"
    file_completed_action => "delete"
    id => "file"
  }
}

filter {
  csv {
      separator => ";"
      autodetect_column_names => true
      skip_empty_columns => true
      id => csv
  }
}

It worked by setting the input over two lines, i.e. looking like this:

---
ignore:
  - "@timestamp"
testcases:
  - input:
      - "TYP_INST;DBNAME;OS_USERNAME;USERNAME;USERHOST;TERMINAL;LOGON_TIME;LOGOFF_TIME;OS_PROCESS;CLIENT_ID"
      - "Prod;myBeautifulDB;myOS;myUsername;myUserhost;;22/08/2021 14:14:43;22/08/2021 14:14:43;4468;"
    expected:
      - message: "Prod;myBeautifulDB;myOS;myUsername;myUserhost;;22/08/2021 14:14:43;22/08/2021 14:14:43;4468;"
        LOGSTASH_DB_LOGS: "true"
        TYP_INST: "Prod"
        DBNAME: "myBeautifulDB"
        OS_USERNAME: "myOS"
        USERNAME: "myUsername"
        USERHOST: "myUserhost"
        LOGON_TIME: "22/08/2021 14:14:43"
        LOGOFF_TIME: "22/08/2021 14:14:43"
        OS_PROCESS: "4468"
input_plugin: file

So in the example from the initial post, it would be:

input_plugin: "input"
fields:
  test: true
ignore:
  - "@timestamp"
testcases:
  - input:
    - "This is message 1 and"
    - "  this is message 1 also"
    - "This is message 2 and"
    - "  this is message 2 also"
    expected:
      - message: "This is message 1 and\n  this is message 1 also"
        tags: ["multiline"]
        test: true
      - message: "This is message 2 and\n  this is message 2 also"
        tags: ["multiline"]
        test: true

Although I agree it is somewhat counterintuitive.

Fixes: magnusbaeck#147

breml · 2021-11-01T09:48:02Z

x input lines can lead to y input events can lead to z output events

There is no direct relationship between:

the number of lines of input
the number of events generated at the input (e.g. multiline codec)
the number of output events emitted from Logstash (e.g. clone & drop filter)

The number of events emitted from Logstash (z) can be evaluated from the number of expected events. But there is currently not yet a good way to reliably determine the number of events, that will be generated from the input plugin (including the codec). This makes it hard to prepare the correct number of fields, that should be added to the event.

Maybe it is necessary to separate the logic to add the global fields (to the event) from the per test case fields. Additionally it might become necessary to create a separate input for each test case.

Related #142

Fixes: magnusbaeck#147

breml · 2021-12-20T15:34:40Z

In #156 I currently have a version, which handles multiline codec better and also correctly maps the fields to the correct input by having a separate input plugin for each input.

The down-side of this approach is, that for a multiline input, always the following form needs to be used:

  - input:
      - |-
        TYP_INST;DBNAME;OS_USERNAME;USERNAME;USERHOST;TERMINAL;LOGON_TIME;LOGOFF_TIME;OS_PROCESS;CLIENT_ID
        Prod;myBeautifulDB;myOS;myUsername;myUserhost;;22/08/2021 14:14:43;22/08/2021 14:14:43;4468;

this form does no longer work (because the two lines are not processed by the same input and therefore not by the same codec):

  - input:
      - TYP_INST;DBNAME;OS_USERNAME;USERNAME;USERHOST;TERMINAL;LOGON_TIME;LOGOFF_TIME;OS_PROCESS;CLIENT_ID
      - Prod;myBeautifulDB;myOS;myUsername;myUserhost;;22/08/2021 14:14:43;22/08/2021 14:14:43;4468;

As mentioned in the previous comment, there are a lot of different combinations, that work against each other and it sort of looks like there is no solution, where all the combinations work.

jgough · 2021-12-20T15:37:35Z

The first format makes more logical sense to me, all those lines being part of one input. Does this mean that in my test above you would have to use:

    - |
      This is message 1 and
        this is message 1 also
    - |
      This is message 2 and
        this is message 2 also

For two inputs?

breml added a commit to breml/logstash-filter-verifier that referenced this issue Oct 22, 2021

Only add fields if present

273248d

Fixes: magnusbaeck#147

breml added a commit to breml/logstash-filter-verifier that referenced this issue Oct 22, 2021

Add second test case from magnusbaeck#147

e56b9df

breml linked a pull request Oct 22, 2021 that will close this issue

Replace generator input with file input in read mode (WIP) #156

Draft

breml added the bug label Oct 22, 2021

breml added a commit to breml/logstash-filter-verifier that referenced this issue Oct 26, 2021

Only add fields if present

3c6ebe7

Fixes: magnusbaeck#147

breml added a commit to breml/logstash-filter-verifier that referenced this issue Oct 26, 2021

Add second test case from magnusbaeck#147

6c280e7

breml added a commit to breml/logstash-filter-verifier that referenced this issue Nov 1, 2021

Only add fields if present

bd14ec5

Fixes: magnusbaeck#147

breml added a commit to breml/logstash-filter-verifier that referenced this issue Nov 1, 2021

Add second test case from magnusbaeck#147

4096149

breml added a commit to breml/logstash-filter-verifier that referenced this issue Dec 20, 2021

Only add fields if present

1783f3a

Fixes: magnusbaeck#147

breml added a commit to breml/logstash-filter-verifier that referenced this issue Dec 20, 2021

Add second test case from magnusbaeck#147

5a5d2ec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lfv_fields_not_found error when using codec that outputs multiple events #147

lfv_fields_not_found error when using codec that outputs multiple events #147

jgough commented Sep 24, 2021

jgough commented Sep 26, 2021

breml commented Oct 18, 2021

nicolasreich commented Oct 22, 2021

breml commented Nov 1, 2021

breml commented Dec 20, 2021

jgough commented Dec 20, 2021

lfv_fields_not_found error when using codec that outputs multiple events #147

lfv_fields_not_found error when using codec that outputs multiple events #147

Comments

jgough commented Sep 24, 2021

jgough commented Sep 26, 2021

breml commented Oct 18, 2021

nicolasreich commented Oct 22, 2021

breml commented Nov 1, 2021

breml commented Dec 20, 2021

jgough commented Dec 20, 2021