Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any chance to make ur have no requirement on single executor machine's memory? #37

Open
WayneWang12 opened this issue Sep 27, 2017 · 2 comments

Comments

@WayneWang12
Copy link

WayneWang12 commented Sep 27, 2017

I'm researching this model and it is really awesome for small companies like us.
I've trained a model easily with 10 million trading orders. However, when I increase the number to 100 million, model cannot be trained.

Actually we have a cluster with 1TB memory. But this model requires the memory size of a single machine. My cluster have 20 nodes and each gets 64GB memory. It is obviously not enough for 10 million orders. I'm wondering if there is any chance for this model to make no requirement to a single machine. I think 1TB is quite enough. The bottleneck is on single machine's memory.

Driver is OK. I can find a temporary machine with 128GB or 256GB for a day. But I can't make this to single executor machines because they are constant and maybe I have to upgrade machines for all.

Or is there any way to make executors run on high memory machines?

@WayneWang12 WayneWang12 changed the title Any chance to make ur have no request on single machine memory? Any chance to make ur have no requirement on single machine memory? Sep 27, 2017
@WayneWang12 WayneWang12 changed the title Any chance to make ur have no requirement on single machine memory? Any chance to make ur have no requirement on single executor machine's memory? Sep 27, 2017
@pferrel
Copy link
Collaborator

pferrel commented Sep 27, 2017 via email

@WayneWang12
Copy link
Author

OK, I see. But why the training in map at package.scala:96 takes so long. The training lasts for already 16.4 hours and we have see no result now. And it only use 12 cores while I've got more than 400.

Is there a way to make it quickier? And also I'll post it to the group.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants