-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compare templating approaches #39
Comments
An additional question I have is around performance concerns which are discussed in the pull requests. Memory seems to be one of the main concerns. I assume this is less about loading and reading the templates itself but during data production? |
I'm adding here some context, as all PR are a bit lacking on it. #36 is the initial PR @aspacca has been working on to add templating to the generator. Generating data from a template allows the tool to generate other possible data formats, like "source" data (i.e. source VPC Flow log to be fed to our data collection). #36 brings in the initial refactoring of the generator, with the new generator being able to generate any output based on a template. By default a backward compatible implementation is used. This generator:
This generator has a limitation: does not support generating values based on other values or any conditional expression. We wanted to enable generating data like with to this Go implementation (interesting features are in the comments): func (v *Vpcflow) randomize() {
// [...]
v.End = time.Now().Unix()
v.Start = v.End - int64(rand.Intn(60)) // refer a previously generated field
v.Action = actions[rand.Intn(2)] // select a value from a set
if v.Packets == 0 { // perform boolean evaluation to select value
v.LogStatus = statuses[2]
} else {
v.LogStatus = statuses[rand.Intn(2)]
}
} This led Andrea to create #38, which is #36 + JS based expressions. The code for #36 implements a very basic template engine (that uses regexp to "parse" the template and extract fields). This implementation is more error prone and in the initial tests from Andrea was outperforming the original implementation by not that much. This led me to think about using Go We decided to push the experiments we did as they were and postpone the discussion/review on results and trade offs. Some considerations:
Hope this makes the overall context a bit clearer. |
Yes, as in general we can assume that the template parsing step can be done only once, while generating data happens at each iteration. |
Thanks for all the details @endorama, super helpful. In elastic/elastic-package#984 (comment) I put together some thoughts on how things could work end-2-end in the context of elastic-package with these changes. |
I'm going to close this as linked PR have been closes and superseded by #41 where we make the final implementation: using Golang |
There are currently 3 open PR's related to templating for the corpus generation tool:
text/template
package #37I would like to use this issue to discuss a bit the similarities and differences between the approaches.
My current understanding is, #37 is using the go text/template as its template language. #38 supports JavaScript expressions as part of the field names. What about #36 ? What are the other core differences between the approaches?
The text was updated successfully, but these errors were encountered: