automate performance tuning #61

vadasg · 2012-11-10T19:32:39Z

For optimal MapReduce performance, a sequence file should be generated and there are several parameters that must be tuned. This process could be automated:

Start with some default set of values for parameters like

mapred.map.child.java.opts
mapred.reduce.child.java.opts
mapred.map.tasks
mapred.reduce.tasks
etc

Attempt to generate sequence file with g.(). Determine size of biggest vertex in memory during this process. If a vertex doesn't fit in memory (so g.() fails), increase memory automatically and retry until it does. If it fails for other reasons, iteratively change above parameters until it succeeds. Restart regionservers automatically as needed.
Once sequence file is generated, determine optimal parameters based on the graph and based on the available memory and number of processors. Generate a parameter file for future analyses.

The idea is to have a possibly long but automated process that will eventually converge on reasonably well optimized parameters.

vadasg mentioned this issue Nov 19, 2012

add new Hadoop counters to track vertex size in bytes #64

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

automate performance tuning #61

automate performance tuning #61

vadasg commented Nov 10, 2012

automate performance tuning #61

automate performance tuning #61

Comments

vadasg commented Nov 10, 2012