You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Now we receive >100 eventlogs per day, and each eventlogs is at GB level size, the large one can reach 7GB, small ones are around 1~2 GB.
The profiling tool now procude full size of the metrics extracted from the eventlogs, but it usually take amount of time.
Is it possible that we pick required parts that Profiling tool can skip lots of work so we can get the results in short time? For example, we only need the "failed_jobs.csv", all other metrics are not needed.
Describe the solution you'd like
N/A
Describe alternatives you've considered
N/A
Additional context
If it's possible, Profiling Tool can provide an extra argument pointing to a json/yaml whatever config file, I can say "I only need failed_job.csv" as output.
The text was updated successfully, but these errors were encountered:
The issue description does not show what is the context?
Is this the python CLI rapids+tools or the java cmd? There are ways to boost the runtime of the tools but that depends on how it is triggered (number of eventlogs processed in parallel and the memory allocated to the process).
Is it taking long time writing to disk or processing eventlogs?
If it is taking long time writing to disk, then it is doable to allow only a subset of files to be written.
If it is taking long time processing, then we go back to the first question to see why it takes time.
Regarding the feature request: It is kind of tough to implement such request.
There is a huge dependency between data extracted.
In some cases, the output cannot be explained unless it is full. For example, users might need to look at the SQL failures or stages in order to understand what that job was doing. This implies, they will have to rerun the tools gain to generated that file.
If we try to refactor the code to do each feature independently, the runtime will end up way longer because of isolating variables.
Is your feature request related to a problem? Please describe.
Now we receive >100 eventlogs per day, and each eventlogs is at GB level size, the large one can reach 7GB, small ones are around 1~2 GB.
The profiling tool now procude
full size
of the metrics extracted from the eventlogs, but it usually take amount of time.Is it possible that we pick required parts that Profiling tool can skip lots of work so we can get the results in short time? For example, we only need the "failed_jobs.csv", all other metrics are not needed.
Describe the solution you'd like
N/A
Describe alternatives you've considered
N/A
Additional context
If it's possible, Profiling Tool can provide an extra argument pointing to a json/yaml whatever config file, I can say "I only need failed_job.csv" as output.
The text was updated successfully, but these errors were encountered: