Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extend enum support #144

Merged
merged 4 commits into from
Jun 3, 2024
Merged

extend enum support #144

merged 4 commits into from
Jun 3, 2024

Conversation

gpop63
Copy link
Contributor

@gpop63 gpop63 commented May 5, 2024

Extended enum support for:

  • long
  • double

How I tested it

Using aws.billing schema-b from assets/templates.

Added a random new field in fields.yml:

- name: aws.billing.group_by.TEST
  type: double

Added only test in aws.billing.group_definition.key in configs.yml.

  - name: aws.billing.group_definition.key
    # NOTE: repeated values are needed to produce 10% cases with "" value
    enum: ["TEST"]

And some enum values for TEST:

  - name: aws.billing.group_by.TEST
    enum: [50.32, 32.25]

Added an extra else if in gotext.tpl.

{{- else if eq $groupBy "TEST"}}
              "TEST": {{generate "aws.billing.group_by.TEST"}}

Command to run:

go run main.go generate-with-template ./assets/templates/aws.billing/schema-b/gotext.tpl ./assets/templates/aws.billing/schema-b/fields.yml --config-file ./assets/templates/aws.billing/schema-b/configs.yml --tot-events 1000

Closes #142

@gpop63 gpop63 marked this pull request as ready for review May 6, 2024 10:35
@gpop63 gpop63 requested a review from a team as a code owner May 6, 2024 10:35
@lalit-satapathy lalit-satapathy requested review from devamanv and shmsr May 8, 2024 05:17
@gpop63
Copy link
Contributor Author

gpop63 commented May 20, 2024

@shmsr the tests sometimes fail because the time generator is generating a timestamp that is slightly before the current time. Maybe we should look into this as well to fix it. It can be reproduced locally. Rerunning them a few times will make them pass.

?       [github.com/elastic/elastic-integration-corpus-generator-tool](http://github.com/elastic/elastic-integration-corpus-generator-tool)    [no test files]
?       [github.com/elastic/elastic-integration-corpus-generator-tool/pkg/genlib/fields](http://github.com/elastic/elastic-integration-corpus-generator-tool/pkg/genlib/fields)  [no test files]
ok      [github.com/elastic/elastic-integration-corpus-generator-tool/cmd](http://github.com/elastic/elastic-integration-corpus-generator-tool/cmd)        0.004s
ok      [github.com/elastic/elastic-integration-corpus-generator-tool/internal/corpus](http://github.com/elastic/elastic-integration-corpus-generator-tool/internal/corpus)    0.003s
ok      [github.com/elastic/elastic-integration-corpus-generator-tool/internal/settings](http://github.com/elastic/elastic-integration-corpus-generator-tool/internal/settings)  0.003s
ok      [github.com/elastic/elastic-integration-corpus-generator-tool/internal/version](http://github.com/elastic/elastic-integration-corpus-generator-tool/internal/version)   0.002s
2024/05/05 23:10:54 rand seed generator initialised with value `4029003210993349982`
2024/05/05 23:10:54 time now generator initialised with value `2024-05-05T20:10:54.918326267Z`
--- FAIL: Test_FieldDateWithCustomTemplate (0.00s)
    generator_with_custom_template_test.go:442: with template: {"alpha":"{{.alpha}}"}
    generator_with_custom_template_test.go:455: Data generated before now, diff: -267ns
FAIL
FAIL    [github.com/elastic/elastic-integration-corpus-generator-tool/pkg/genlib](http://github.com/elastic/elastic-integration-corpus-generator-tool/pkg/genlib) 1.753s
ok      [github.com/elastic/elastic-integration-corpus-generator-tool/pkg/genlib/config](http://github.com/elastic/elastic-integration-corpus-generator-tool/pkg/genlib/config)  0.005s
FAIL

@shmsr
Copy link
Member

shmsr commented May 21, 2024

@shmsr the tests sometimes fail because the time generator is generating a timestamp that is slightly before the current time. Maybe we should look into this as well to fix it. It can be reproduced locally. Rerunning them a few times will make them pass.

?       [github.com/elastic/elastic-integration-corpus-generator-tool](http://github.com/elastic/elastic-integration-corpus-generator-tool)    [no test files]
?       [github.com/elastic/elastic-integration-corpus-generator-tool/pkg/genlib/fields](http://github.com/elastic/elastic-integration-corpus-generator-tool/pkg/genlib/fields)  [no test files]
ok      [github.com/elastic/elastic-integration-corpus-generator-tool/cmd](http://github.com/elastic/elastic-integration-corpus-generator-tool/cmd)        0.004s
ok      [github.com/elastic/elastic-integration-corpus-generator-tool/internal/corpus](http://github.com/elastic/elastic-integration-corpus-generator-tool/internal/corpus)    0.003s
ok      [github.com/elastic/elastic-integration-corpus-generator-tool/internal/settings](http://github.com/elastic/elastic-integration-corpus-generator-tool/internal/settings)  0.003s
ok      [github.com/elastic/elastic-integration-corpus-generator-tool/internal/version](http://github.com/elastic/elastic-integration-corpus-generator-tool/internal/version)   0.002s
2024/05/05 23:10:54 rand seed generator initialised with value `4029003210993349982`
2024/05/05 23:10:54 time now generator initialised with value `2024-05-05T20:10:54.918326267Z`
--- FAIL: Test_FieldDateWithCustomTemplate (0.00s)
    generator_with_custom_template_test.go:442: with template: {"alpha":"{{.alpha}}"}
    generator_with_custom_template_test.go:455: Data generated before now, diff: -267ns
FAIL
FAIL    [github.com/elastic/elastic-integration-corpus-generator-tool/pkg/genlib](http://github.com/elastic/elastic-integration-corpus-generator-tool/pkg/genlib) 1.753s
ok      [github.com/elastic/elastic-integration-corpus-generator-tool/pkg/genlib/config](http://github.com/elastic/elastic-integration-corpus-generator-tool/pkg/genlib/config)  0.005s
FAIL

We can use this: https://github.com/elastic/beats/blob/main/x-pack/filebeat/input/salesforce/helper.go

@gpop63
Copy link
Contributor Author

gpop63 commented May 21, 2024

We can use this: https://github.com/elastic/beats/blob/main/x-pack/filebeat/input/salesforce/helper.go

@shmsr Not sure if it's that straightforward. We would have to mock it in multiple places. Or maybe even mock the generator?

testSingleTWithCustomTemplate is being called in the test

func testSingleTWithCustomTemplate[T any](t *testing.T, fld Field, yaml []byte, template []byte) T {

which uses an actual implementation of a generator that calls the emit function to create timestamps

func (gen *GeneratorWithCustomTemplate) emit(buf *bytes.Buffer) error {

the actual func that creates the timestamps is this one

func nearTime(fieldCfg ConfigField, state *genState) time.Time {

Maybe we can add a bit of tolerance and change the test to something like this

tolerance := time.Millisecond
if diff < -tolerance || diff > FieldTypeDurationSpan*time.Millisecond+tolerance {
    t.Errorf("Data generated before now, diff: %v", diff)
}

or we could truncate previous

previous := timeNowToBind.Truncate(time.Millisecond)

@shmsr
Copy link
Member

shmsr commented May 22, 2024

Changes related to enum look more or less good. Here's a diff patch but only refer enum related changes as there a few more unrelated changes. For long, I think we should go with ParseInt instead of ParseFloat.

diff --git a/pkg/genlib/generator_interface.go b/pkg/genlib/generator_interface.go
index 5acd7db..4a56c2f 100644
--- a/pkg/genlib/generator_interface.go
+++ b/pkg/genlib/generator_interface.go
@@ -99,7 +99,6 @@ func newGenState() *genState {
 }
 
 func bindField(cfg Config, field Field, fieldMap map[string]any, withReturn bool) error {
-
 	// Check for hardcoded field value
 	if len(field.Value) > 0 {
 		if withReturn {
@@ -164,7 +163,6 @@ func isDupeInterface(va []any, dst any) bool {
 }
 
 func bindByType(cfg Config, field Field, fieldMap map[string]any) (err error) {
-
 	fieldCfg, _ := cfg.GetField(field.Name)
 
 	switch field.Type {
@@ -194,7 +192,6 @@ func bindByType(cfg Config, field Field, fieldMap map[string]any) (err error) {
 }
 
 func bindByTypeWithReturn(cfg Config, field Field, fieldMap map[string]any) (err error) {
-
 	fieldCfg, _ := cfg.GetField(field.Name)
 
 	switch field.Type {
@@ -300,7 +297,6 @@ func bindObject(cfg Config, fieldCfg ConfigField, field Field, fieldMap map[stri
 }
 
 func bindDynamicObject(cfg Config, field Field, fieldMap map[string]any) error {
-
 	// Temporary fieldMap which we pass to the bind function,
 	// then extract the generated emitFunction for use in the stub.
 	dynMap := make(map[string]any)
@@ -315,7 +311,6 @@ func bindDynamicObject(cfg Config, field Field, fieldMap map[string]any) error {
 }
 
 func genNounsN(n int, buf *bytes.Buffer) {
-
 	for i := 0; i < n-1; i++ {
 		buf.WriteString(randomdata.Noun())
 		buf.WriteByte(' ')
@@ -414,6 +409,7 @@ func totWordsAndJoiner(fieldExample string) (int, string) {
 
 	return totWords, joiner
 }
+
 func bindJoinRand(field Field, N int, joiner string, fieldMap map[string]any) error {
 	var emitFNotReturn emitFNotReturn
 	emitFNotReturn = func(state *genState, buf *bytes.Buffer) error {
@@ -518,7 +514,6 @@ func nearTime(fieldCfg ConfigField, state *genState) time.Time {
 		} else {
 			fieldCfg.Period = timeNowToBind.UTC().Sub(from.UTC())
 		}
-
 	}
 
 	if errFrom != nil && errTo == nil {
@@ -735,7 +730,6 @@ func bindDouble(fieldCfg ConfigField, field Field, fieldMap map[string]any) erro
 }
 
 func bindCardinality(cfg Config, field Field, fieldMap map[string]any) error {
-
 	fieldCfg, _ := cfg.GetField(field.Name)
 	cardinality := fieldCfg.Cardinality
 
@@ -958,6 +952,7 @@ func bindIPWithReturn(field Field, fieldMap map[string]any) error {
 	fieldMap[field.Name] = emitF
 	return nil
 }
+
 func randIP() (int, int, int, int) {
 	i0 := customRand.Intn(255)
 	i1 := customRand.Intn(255)
@@ -966,27 +961,29 @@ func randIP() (int, int, int, int) {
 
 	return i0, i1, i2, i3
 }
+
 func bindLongWithReturn(fieldCfg ConfigField, field Field, fieldMap map[string]any) error {
 	if err := fieldCfg.ValidCounter(); err != nil {
 		return err
 	}
 
+	// If an enum is provided, use it to generate random values from the enum
 	if len(fieldCfg.Enum) > 0 {
 		var emitF emitF
 		emitF = func(state *genState) any {
 			idx := customRand.Intn(len(fieldCfg.Enum))
-			f, err := strconv.ParseFloat(fieldCfg.Enum[idx], 64)
+			f, err := strconv.ParseInt(fieldCfg.Enum[idx], 10, 64)
 			if err != nil {
 				return err
 			}
-			return int64(f)
+			return f
 		}
 
 		fieldMap[field.Name] = emitF
-
 		return nil
 	}
 
+	// If the field is marked as a counter, generate sequential values
 	if fieldCfg.Counter {
 		var emitF emitF
 
@@ -1001,7 +998,6 @@ func bindLongWithReturn(fieldCfg ConfigField, field Field, fieldMap map[string]a
 
 			if fieldCfg.Fuzziness <= 0 {
 				dummyFunc = makeIntCounterFunc(previous, field)
-
 				dummyInt = dummyFunc()
 			} else {
 				dummyInt = fuzzyIntCounter(previous, fieldCfg.Fuzziness)
@@ -1012,10 +1008,10 @@ func bindLongWithReturn(fieldCfg ConfigField, field Field, fieldMap map[string]a
 		}
 
 		fieldMap[field.Name] = emitF
-
 		return nil
 	}
 
+	// If the field is not a counter, generate random values within the specified range
 	dummyFunc := makeIntFunc(fieldCfg, field)
 
 	if fieldCfg.Fuzziness <= 0 {
@@ -1090,37 +1086,29 @@ func bindDoubleWithReturn(fieldCfg ConfigField, field Field, fieldMap map[string
 	}
 
 	if len(fieldCfg.Enum) > 0 {
-		var emitF emitF
-		emitF = func(state *genState) any {
+		emitF := func(state *genState) any {
 			idx := customRand.Intn(len(fieldCfg.Enum))
 			f, err := strconv.ParseFloat(fieldCfg.Enum[idx], 64)
 			if err != nil {
-				return err
+				// Handle parsing error, e.g., log or return a default value
+				return 0.0
 			}
-
 			return f
 		}
-
 		fieldMap[field.Name] = emitF
-
 		return nil
 	}
 
 	if fieldCfg.Counter {
-		var emitF emitF
-
-		emitF = func(state *genState) any {
+		emitF := func(state *genState) any {
 			previous := float64(1)
-			var dummyFloat float64
-			var dummyFunc func() float64
-
 			if previousDummyFloat, ok := state.prevCache[field.Name].(float64); ok {
 				previous = previousDummyFloat
 			}
 
+			var dummyFloat float64
 			if fieldCfg.Fuzziness <= 0 {
-				dummyFunc = makeFloatCounterFunc(previous, field)
-
+				dummyFunc := makeFloatCounterFunc(previous, field)
 				dummyFloat = dummyFunc()
 			} else {
 				dummyFloat = fuzzyFloatCounter(previous, fieldCfg.Fuzziness)
@@ -1129,9 +1117,7 @@ func bindDoubleWithReturn(fieldCfg ConfigField, field Field, fieldMap map[string
 			state.prevCache[field.Name] = dummyFloat
 			return dummyFloat
 		}
-
 		fieldMap[field.Name] = emitF
-
 		return nil
 	}
 
@@ -1142,17 +1128,23 @@ func bindDoubleWithReturn(fieldCfg ConfigField, field Field, fieldMap map[string
 		emitF = func(state *genState) any {
 			return dummyFunc()
 		}
-
 		fieldMap[field.Name] = emitF
 
 		return nil
 	}
 
-	min, _ := fieldCfg.Range.MinAsFloat64()
-	max, _ := fieldCfg.Range.MaxAsFloat64()
+	min, err := fieldCfg.Range.MinAsFloat64()
+	if err != nil {
+		// Handle error
+		return err
+	}
+	max, err := fieldCfg.Range.MaxAsFloat64()
+	if err != nil {
+		// Handle error
+		return err
+	}
 
-	var emitF emitF
-	emitF = func(state *genState) any {
+	emitF := func(state *genState) any {
 		var dummyFloat float64
 		if previousDummyFloat, ok := state.prevCache[field.Name].(float64); ok {
 			dummyFloat = fuzzyFloat(previousDummyFloat, fieldCfg.Fuzziness, min, max)
@@ -1164,12 +1156,10 @@ func bindDoubleWithReturn(fieldCfg ConfigField, field Field, fieldMap map[string
 	}
 
 	fieldMap[field.Name] = emitF
-
 	return nil
 }
 
 func bindCardinalityWithReturn(cfg Config, field Field, fieldMap map[string]any) error {
-
 	fieldCfg, _ := cfg.GetField(field.Name)
 	cardinality := fieldCfg.Cardinality
 
@@ -1245,7 +1235,6 @@ func bindObjectWithReturn(cfg Config, fieldCfg ConfigField, field Field, fieldMa
 }
 
 func bindDynamicObjectWithReturn(cfg Config, field Field, fieldMap map[string]any) error {
-
 	// Temporary fieldMap which we pass to the bind function,
 	// then extract the generated emitFunction for use in the stub.
 	dynMap := make(map[string]any)

@shmsr
Copy link
Member

shmsr commented May 22, 2024

We can use this: https://github.com/elastic/beats/blob/main/x-pack/filebeat/input/salesforce/helper.go

@shmsr Not sure if it's that straightforward. We would have to mock it in multiple places. Or maybe even mock the generator?

testSingleTWithCustomTemplate is being called in the test

func testSingleTWithCustomTemplate[T any](t *testing.T, fld Field, yaml []byte, template []byte) T {

which uses an actual implementation of a generator that calls the emit function to create timestamps

func (gen *GeneratorWithCustomTemplate) emit(buf *bytes.Buffer) error {

the actual func that creates the timestamps is this one

func nearTime(fieldCfg ConfigField, state *genState) time.Time {

Maybe we can add a bit of tolerance and change the test to something like this

tolerance := time.Millisecond
if diff < -tolerance || diff > FieldTypeDurationSpan*time.Millisecond+tolerance {
    t.Errorf("Data generated before now, diff: %v", diff)
}

or we could truncate previous

previous := timeNowToBind.Truncate(time.Millisecond)

Hey, thanks. I did not take a deeper look. For now, let's add tolerance like you suggested but let's do it properly in upcoming enhancements that we are going to make. Please a NOTE comment if we are going with the tolerance change or the truncation.

@devamanv What do you suggest?

@shmsr
Copy link
Member

shmsr commented May 22, 2024

Also, from what I see there are so many code smells in the repo. We will file a separate PR to address most of the code smells as a single PR reported by staticcheck, golangci-lint, revive, vet, etc. Also need to gofumpt the code. But we will do it separately PR.

@gpop63
Copy link
Contributor Author

gpop63 commented May 22, 2024

@shmsr

I just checked and we should move the error check outside of emitF because if there is an error happening we would just return the error string as the value.

"group_by": {
              "TEST": strconv.ParseInt: parsing "44.32": invalid syntax
            }

Not sure if we should return a default value like 0 if parsing fails, if we move it outside of emitF we can just return and stop the process - signaling to the user that the config is wrong.

@shmsr
Copy link
Member

shmsr commented May 27, 2024

@shmsr

I just checked and we should move the error check outside of emitF because if there is an error happening we would just return the error string as the value.

"group_by": {
              "TEST": strconv.ParseInt: parsing "44.32": invalid syntax
            }

Not sure if we should return a default value like 0 if parsing fails, if we move it outside of emitF we can just return and stop the process - signaling to the user that the config is wrong.

Sorry for the late reply. I agree. We have to move the err check outside or do something else.

idx := customRand.Intn(len(fieldCfg.Enum))
f, err := strconv.ParseFloat(fieldCfg.Enum[idx], 64)
if err != nil {
return fmt.Errorf("field %s enum value is not a float: %w", fieldCfg.Name, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of float, should we use double in the sentence?

idx := customRand.Intn(len(fieldCfg.Enum))
f, err := strconv.ParseInt(fieldCfg.Enum[idx], 10, 64)
if err != nil {
return fmt.Errorf("field %s enum value is not an integer: %w", fieldCfg.Name, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of integer, should we use long in the sentence?

Copy link

@devamanv devamanv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@shmsr shmsr merged commit a33344d into elastic:main Jun 3, 2024
2 checks passed
@shmsr
Copy link
Member

shmsr commented Jun 3, 2024

@gpop63
Copy link
Contributor Author

gpop63 commented Jun 3, 2024

@shmsr Yeah it's the same test we discussed previously #144 (comment) --- FAIL: Test_FieldDateWithCustomTemplate (0.01s). Let's create an issue to address this test.

@shmsr
Copy link
Member

shmsr commented Jun 3, 2024

Oh, ok. Thanks. I'll create one if you haven't already opened one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extend enum support
3 participants