[TASK] Improve core's memory utilization by reducing storage of Accum task updates #1545

amahussein · 2025-02-13T21:36:27Z

Is your feature request related to a problem? Please describe.

The core module is using a significant memory heap to process the eventlong successfuly.
After performing several iterations to enumerate bottlenecks, it was found that around 60% of the heap utilization is occupied by the Task accumulable updates.

The flow is as follows:

taskEnd event is scanned
for each accumulable in the taskEnd; a new entry (long, long) is added to the AccumInfo.taskUpdateMap.
This map is kept in memory
The map is accessed to aggregate the statistics of the accummulables across the tasks of a specific stage.

In the above flow, it is noticed that the storage of the taskAccumulables is needed only to build the stats. Knowing, that the total is stored as part of the StageAccumulable. Then, it is clear that this taskUpdateMap represents a huge overhead.

Describe the solution you'd like

A quick around is to skip storing individual task-to-accumulable update. Instead the core aggregates the statistics per stage on the fly.
This requires the following:

approximately get the moving average of the task-updates.
with every task-end we update the statistics metrics of the stage. In other words, this means we are creating extra statiscs object but hopefully this is still less memory than taskUpdates would be using.

Describe alternatives you've considered

We can free memory if we can offload the taskUpdateMap to the disk (using binary files like rocksDB or parquet). Then load the updates by stageID as necessary.
This is the ultimate fix for that problem; however this will be a significant code-rewriting and synchronizations. Since the task-accumulable data is not used so far in the core, then the effort won't be justified.

The text was updated successfully, but these errors were encountered:

amahussein added core_tools Scope the core module (scala) performance performance and scalability of tools labels Feb 13, 2025

amahussein assigned sayedbilalbari Feb 13, 2025

This was referenced Feb 13, 2025

[FEA] Qualification tool: Improve memory consumption while processing large eventlogs. #815

Open

Memory Optimization - AccumInfo Refactor #1543

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TASK] Improve core's memory utilization by reducing storage of Accum task updates #1545

[TASK] Improve core's memory utilization by reducing storage of Accum task updates #1545

amahussein commented Feb 13, 2025

[TASK] Improve core's memory utilization by reducing storage of Accum task updates #1545

[TASK] Improve core's memory utilization by reducing storage of Accum task updates #1545

Comments

amahussein commented Feb 13, 2025