Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Explosion #5

Open
pedrocolon93 opened this issue Apr 29, 2019 · 10 comments
Open

Memory Explosion #5

pedrocolon93 opened this issue Apr 29, 2019 · 10 comments

Comments

@pedrocolon93
Copy link

Hi there,

I have been trying to run this for the past couple of weeks, and it seems one needs a beefy computer to run through the process. In the reader.py code, where the multiprocessing is used, at some point in the planning, I believe the system goes either into a very deep recursion, or it explores too many branches, and eventually, the system runs out of memory. I've tried this on a machine with 64gb of ram and it eats that up too. The problem occurs around iteration 890 in the create_plans method of the DataReader class. I'm going to try and debug this to see if I can limit depth or size or run time as lack of a feasible plan. A small memory optimization is to replace Pool with ThreadPool.

@AmitMY
Copy link
Owner

AmitMY commented Apr 29, 2019

So sorry about that! You are correct, I am running this on a beefy server...

Going over plans in an iterator is possible, but isn't needed because
Here are 2 possible solutions:

  • In the following line, you can change is_parallel to False. This will only plan one graph at a time, so if you don't have too large of a graph, it would work, however, will take long (for example for the WebNLG test set, 1.5 hours)
    is_parallel = True
  • More possible solution, that would work on any graph size, and will take 0 seconds per graph is to use the NeuralPlanner instead. https://github.com/AmitMY/chimera/blob/master/planner/neural_planner.py
    This planner is for ongoing research on how to make online-planning, and how to avoid the need for "experts" to score plans.
    On WebNLG the current version of this planner performs as-good in terms of automatic metrics, but haven't been tested in terms oh human evaluation. There will be updates to it in the weeks to come!
    To switch to the Neural Planner, you need to uncomment this line:
    # neural_planner = NeuralPlanner()
    and change the config in this line:
    planner=naive_planner)

    Also, you need to remove the directory cache/WebNLG/planner if it exists, for the new planner to initialize.

The planner takes around 15 seconds to train on my machine. The inference time for 1,900 graphs of various sizes is 7 seconds on my machine.

The Neural Planner is built in such a way it can't create a wrong plan (it has to follow a structure, and has to include every fact from the graph).


Feel free to suggest any improvements to the naive planner / neural planner. or let me know if anything is not working for you, and I'll sort it out.

@pedrocolon93
Copy link
Author

Awesome! I'm testing it out with the neural planner and i'll get back eventually on what happens!

@AmitMY
Copy link
Owner

AmitMY commented Apr 29, 2019

Just to make sure you know, you don't need to restart the entire thing for the change to take effect, only remove cache/WebNLG/planner, so your translation model, and pre-processing is still cached.

@pedrocolon93
Copy link
Author

pedrocolon93 commented Apr 30, 2019

Yup I noticed! I tried to put it into the server to see if I could visualize it, but the neural planner is missing the score method implementation. I'm guessing its missing the score method because it just generates the plan

@AmitMY
Copy link
Owner

AmitMY commented Apr 30, 2019 via email

@pedrocolon93
Copy link
Author

pedrocolon93 commented Apr 30, 2019

Yeah... still need to do that part. That is part of why it is not documented - ongoing research. And in the neural planner’s case, getting the “best” plan is very fast, but scoring all plans would probably be slow

On Apr 30, 2019, at 23:46, Pedro @.***> wrote: Yup I noticed! I tried to put it into the server to see if I could visualize it, but the neural planner is missing the score method implementation — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

If it gets the best plan, then thats the highest score (and if there were a way to vectorize the plans then just scale the score according to the closest plan vectors). Some other thoughts are that maybe something could be done like the discriminator in a GAN (just some shower thoughts)

@pedrocolon93
Copy link
Author

So sorry about that! You are correct, I am running this on a beefy server...

Going over plans in an iterator is possible, but isn't needed because
Here are 2 possible solutions:

  • In the following line, you can change is_parallel to False. This will only plan one graph at a time, so if you don't have too large of a graph, it would work, however, will take long (for example for the WebNLG test set, 1.5 hours)

    is_parallel = True

  • More possible solution, that would work on any graph size, and will take 0 seconds per graph is to use the NeuralPlanner instead. https://github.com/AmitMY/chimera/blob/master/planner/neural_planner.py
    This planner is for ongoing research on how to make online-planning, and how to avoid the need for "experts" to score plans.
    On WebNLG the current version of this planner performs as-good in terms of automatic metrics, but haven't been tested in terms oh human evaluation. There will be updates to it in the weeks to come!
    To switch to the Neural Planner, you need to uncomment this line:

    # neural_planner = NeuralPlanner()

    and change the config in this line:
    planner=naive_planner)

    Also, you need to remove the directory cache/WebNLG/planner if it exists, for the new planner to initialize.

The planner takes around 15 seconds to train on my machine. The inference time for 1,900 graphs of various sizes is 7 seconds on my machine.

The Neural Planner is built in such a way it can't create a wrong plan (it has to follow a structure, and has to include every fact from the graph).

Feel free to suggest any improvements to the naive planner / neural planner. or let me know if anything is not working for you, and I'll sort it out.

Also, even if you set parallel to False, you still have the memory explosion at some point in the process, so I guess filtering the graph somehow to remove redundancies or make it smaller will have to be a thing. I'll see if I can come up with something.

@AmitMY
Copy link
Owner

AmitMY commented May 1, 2019

Ok. So, I'll explain what I think a solution can be, but I don't have time to test this until the end of NAACL (mid-June) - Though I did implement it.

The "best_plan" planning is done in 3 stages:

  1. Create a tree of possible traversals over the graph. While possibly large, shouldn't be a problem in terms of memory.
  2. linearize the tree into every traversal in that tree, and create a string for it. This is very expensive.
  3. score all plans and choose the best one

Step 1 is not a problem. Step 2 however can be change to yield instead of return, such that in the following code:

chimera/utils/graph.py

Lines 74 to 85 in 8f6df05

def linearizations(self):
none_empty = [s for s in self.rec_linearizations() if len(s) > 0]
return [" ".join(s[:-1]) for s in none_empty if s[-1] != NodeType.FILTER_OUT]
def rec_linearizations(self):
if self.next is None:
return [[self.value]]
if self.value == NodeType.OR:
return [l for n in self.next for l in n.rec_linearizations()]
return [[self.value] + l for n in self.next for l in n.rec_linearizations()]

In lines 76, 80, 83, and 85 instead of doing a list with parenthesis, like:

chimera/utils/graph.py

Lines 74 to 85 in e939eac

def linearizations(self):
none_empty = [s for s in self.rec_linearizations() if len(s) > 0]
return (" ".join(s[:-1]) for s in none_empty if s[-1] != NodeType.FILTER_OUT)
def rec_linearizations(self):
if self.next is None:
return [[self.value]]
if self.value == NodeType.OR:
return (l for n in self.next for l in n.rec_linearizations())
return ([self.value] + l for n in self.next for l in n.rec_linearizations())

I have done the same for StructuredNode, which is a tiny bit more complex, but same idea.

Step 3 now becomes the problem, as it does not work with generators:

if len(all_plans) == 0:
return ""
all_scores = [self.scorer.score(p) for p in all_plans]
max_i = np.argmax(all_scores)
return all_plans[max_i]

So I changed it as well, to the most basic implementation of O(1) memory MAX function:

best_plan = best_plan_score = 0
for plan in all_plans:
score = self.scorer.score(plan)
if score > best_plan_score:
best_plan_score = score
best_plan = plan
return best_plan

If you want to test this, it is on the development branch, but it is not stable (results will be lower than what you expect, or it just won't work). If this is not pressing for you, I would wait, but if it is, you can also just take the last 3 commits to that branch, which are only these changes.

@AmitMY
Copy link
Owner

AmitMY commented May 3, 2019

I have implemented a fast scorer for the neural planner. Around 500~ plans a second on average, which depends on how long they are.
I will push that in the next few days, so you don't need to copy all the above code :)

@pedrocolon93
Copy link
Author

I havent had a chance to test it, but awesome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants