Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

automate performance tuning #61

Open
vadasg opened this issue Nov 10, 2012 · 0 comments
Open

automate performance tuning #61

vadasg opened this issue Nov 10, 2012 · 0 comments

Comments

@vadasg
Copy link
Member

vadasg commented Nov 10, 2012

For optimal MapReduce performance, a sequence file should be generated and there are several parameters that must be tuned. This process could be automated:

  1. Start with some default set of values for parameters like

mapred.map.child.java.opts
mapred.reduce.child.java.opts
mapred.map.tasks
mapred.reduce.tasks
etc

  1. Attempt to generate sequence file with g.(). Determine size of biggest vertex in memory during this process. If a vertex doesn't fit in memory (so g.() fails), increase memory automatically and retry until it does. If it fails for other reasons, iteratively change above parameters until it succeeds. Restart regionservers automatically as needed.

  2. Once sequence file is generated, determine optimal parameters based on the graph and based on the available memory and number of processors. Generate a parameter file for future analyses.

The idea is to have a possibly long but automated process that will eventually converge on reasonably well optimized parameters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant