Memory Explosion #5

pedrocolon93 · 2019-04-29T15:29:30Z

Hi there,

I have been trying to run this for the past couple of weeks, and it seems one needs a beefy computer to run through the process. In the reader.py code, where the multiprocessing is used, at some point in the planning, I believe the system goes either into a very deep recursion, or it explores too many branches, and eventually, the system runs out of memory. I've tried this on a machine with 64gb of ram and it eats that up too. The problem occurs around iteration 890 in the create_plans method of the DataReader class. I'm going to try and debug this to see if I can limit depth or size or run time as lack of a feasible plan. A small memory optimization is to replace Pool with ThreadPool.

AmitMY · 2019-04-29T19:20:51Z

So sorry about that! You are correct, I am running this on a beefy server...

Going over plans in an iterator is possible, but isn't needed because
Here are 2 possible solutions:

In the following line, you can change is_parallel to False. This will only plan one graph at a time, so if you don't have too large of a graph, it would work, however, will take long (for example for the WebNLG test set, 1.5 hours)

chimera/planner/naive_planner.py

Line 10 in 8f6df05

is_parallel = True
More possible solution, that would work on any graph size, and will take 0 seconds per graph is to use the NeuralPlanner instead. https://github.com/AmitMY/chimera/blob/master/planner/neural_planner.py
This planner is for ongoing research on how to make online-planning, and how to avoid the need for "experts" to score plans.
On WebNLG the current version of this planner performs as-good in terms of automatic metrics, but haven't been tested in terms oh human evaluation. There will be updates to it in the weeks to come!
To switch to the Neural Planner, you need to uncomment this line:

chimera/main.py

Line 45 in 8f6df05

# neural_planner = NeuralPlanner()

and change the config in this line:

chimera/main.py

Line 48 in 8f6df05

planner=naive_planner)

Also, you need to remove the directory cache/WebNLG/planner if it exists, for the new planner to initialize.

The planner takes around 15 seconds to train on my machine. The inference time for 1,900 graphs of various sizes is 7 seconds on my machine.

The Neural Planner is built in such a way it can't create a wrong plan (it has to follow a structure, and has to include every fact from the graph).

Feel free to suggest any improvements to the naive planner / neural planner. or let me know if anything is not working for you, and I'll sort it out.

pedrocolon93 · 2019-04-29T20:32:49Z

Awesome! I'm testing it out with the neural planner and i'll get back eventually on what happens!

AmitMY · 2019-04-29T21:43:18Z

Just to make sure you know, you don't need to restart the entire thing for the change to take effect, only remove cache/WebNLG/planner, so your translation model, and pre-processing is still cached.

pedrocolon93 · 2019-04-30T20:46:01Z

Yup I noticed! I tried to put it into the server to see if I could visualize it, but the neural planner is missing the score method implementation. I'm guessing its missing the score method because it just generates the plan

AmitMY · 2019-04-30T21:01:56Z

Yeah... still need to do that part. That is part of why it is not documented - ongoing research. And in the neural planner’s case, getting the “best” plan is very fast, but scoring all plans would probably be slow

…

On Apr 30, 2019, at 23:46, Pedro ***@***.***> wrote: Yup I noticed! I tried to put it into the server to see if I could visualize it, but the neural planner is missing the score method implementation — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

pedrocolon93 · 2019-04-30T21:05:13Z

Yeah... still need to do that part. That is part of why it is not documented - ongoing research. And in the neural planner’s case, getting the “best” plan is very fast, but scoring all plans would probably be slow
…
On Apr 30, 2019, at 23:46, Pedro @.***> wrote: Yup I noticed! I tried to put it into the server to see if I could visualize it, but the neural planner is missing the score method implementation — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

If it gets the best plan, then thats the highest score (and if there were a way to vectorize the plans then just scale the score according to the closest plan vectors). Some other thoughts are that maybe something could be done like the discriminator in a GAN (just some shower thoughts)

pedrocolon93 · 2019-04-30T23:16:14Z

So sorry about that! You are correct, I am running this on a beefy server...

Going over plans in an iterator is possible, but isn't needed because
Here are 2 possible solutions:

In the following line, you can change is_parallel to False. This will only plan one graph at a time, so if you don't have too large of a graph, it would work, however, will take long (for example for the WebNLG test set, 1.5 hours)

chimera/planner/naive_planner.py

Line 10 in 8f6df05

is_parallel = True

More possible solution, that would work on any graph size, and will take 0 seconds per graph is to use the NeuralPlanner instead. https://github.com/AmitMY/chimera/blob/master/planner/neural_planner.py
This planner is for ongoing research on how to make online-planning, and how to avoid the need for "experts" to score plans.
On WebNLG the current version of this planner performs as-good in terms of automatic metrics, but haven't been tested in terms oh human evaluation. There will be updates to it in the weeks to come!
To switch to the Neural Planner, you need to uncomment this line:

chimera/main.py

Line 45 in 8f6df05

# neural_planner = NeuralPlanner()

and change the config in this line:

chimera/main.py

Line 48 in 8f6df05

planner=naive_planner)

Also, you need to remove the directory cache/WebNLG/planner if it exists, for the new planner to initialize.

The planner takes around 15 seconds to train on my machine. The inference time for 1,900 graphs of various sizes is 7 seconds on my machine.

The Neural Planner is built in such a way it can't create a wrong plan (it has to follow a structure, and has to include every fact from the graph).

Feel free to suggest any improvements to the naive planner / neural planner. or let me know if anything is not working for you, and I'll sort it out.

Also, even if you set parallel to False, you still have the memory explosion at some point in the process, so I guess filtering the graph somehow to remove redundancies or make it smaller will have to be a thing. I'll see if I can come up with something.

AmitMY · 2019-05-01T12:17:06Z

Ok. So, I'll explain what I think a solution can be, but I don't have time to test this until the end of NAACL (mid-June) - Though I did implement it.

The "best_plan" planning is done in 3 stages:

Create a tree of possible traversals over the graph. While possibly large, shouldn't be a problem in terms of memory.
linearize the tree into every traversal in that tree, and create a string for it. This is very expensive.
score all plans and choose the best one

Step 1 is not a problem. Step 2 however can be change to yield instead of return, such that in the following code:

chimera/utils/graph.py

Lines 74 to 85 in 8f6df05

    
           def linearizations(self): 
        
               none_empty = [s for s in self.rec_linearizations() if len(s) > 0] 
        
               return [" ".join(s[:-1]) for s in none_empty if s[-1] != NodeType.FILTER_OUT] 
        
           def rec_linearizations(self): 
        
               if self.next is None: 
        
                   return [[self.value]] 
        
               if self.value == NodeType.OR: 
        
                   return [l for n in self.next for l in n.rec_linearizations()] 
        
               return [[self.value] + l for n in self.next for l in n.rec_linearizations()]

In lines 76, 80, 83, and 85 instead of doing a list with parenthesis, like:

chimera/utils/graph.py

Lines 74 to 85 in e939eac

    
           def linearizations(self): 
        
               none_empty = [s for s in self.rec_linearizations() if len(s) > 0] 
        
               return (" ".join(s[:-1]) for s in none_empty if s[-1] != NodeType.FILTER_OUT) 
        
           def rec_linearizations(self): 
        
               if self.next is None: 
        
                   return [[self.value]] 
        
               if self.value == NodeType.OR: 
        
                   return (l for n in self.next for l in n.rec_linearizations()) 
        
               return ([self.value] + l for n in self.next for l in n.rec_linearizations())

I have done the same for StructuredNode, which is a tiny bit more complex, but same idea.

Step 3 now becomes the problem, as it does not work with generators:

chimera/planner/naive_planner.py

Lines 31 to 36 in 8f6df05

    
           if len(all_plans) == 0: 
        
               return "" 
        
           all_scores = [self.scorer.score(p) for p in all_plans] 
        
           max_i = np.argmax(all_scores) 
        
           return all_plans[max_i]

So I changed it as well, to the most basic implementation of O(1) memory MAX function:

chimera/planner/naive_planner.py

Lines 34 to 41 in b69194e

    
           best_plan = best_plan_score = 0 
        
           for plan in all_plans: 
        
               score = self.scorer.score(plan) 
        
               if score > best_plan_score: 
        
                   best_plan_score = score 
        
                   best_plan = plan 
        
           return best_plan

If you want to test this, it is on the development branch, but it is not stable (results will be lower than what you expect, or it just won't work). If this is not pressing for you, I would wait, but if it is, you can also just take the last 3 commits to that branch, which are only these changes.

AmitMY · 2019-05-03T17:24:27Z

I have implemented a fast scorer for the neural planner. Around 500~ plans a second on average, which depends on how long they are.
I will push that in the next few days, so you don't need to copy all the above code :)

pedrocolon93 · 2019-05-07T16:49:19Z

I havent had a chance to test it, but awesome!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory Explosion #5

Memory Explosion #5

pedrocolon93 commented Apr 29, 2019

AmitMY commented Apr 29, 2019 •

edited

Loading

pedrocolon93 commented Apr 29, 2019

AmitMY commented Apr 29, 2019

pedrocolon93 commented Apr 30, 2019 •

edited

Loading

AmitMY commented Apr 30, 2019 via email

pedrocolon93 commented Apr 30, 2019 •

edited

Loading

pedrocolon93 commented Apr 30, 2019

AmitMY commented May 1, 2019

AmitMY commented May 3, 2019

pedrocolon93 commented May 7, 2019

Memory Explosion #5

Memory Explosion #5

Comments

pedrocolon93 commented Apr 29, 2019

AmitMY commented Apr 29, 2019 • edited Loading

pedrocolon93 commented Apr 29, 2019

AmitMY commented Apr 29, 2019

pedrocolon93 commented Apr 30, 2019 • edited Loading

AmitMY commented Apr 30, 2019 via email

pedrocolon93 commented Apr 30, 2019 • edited Loading

pedrocolon93 commented Apr 30, 2019

AmitMY commented May 1, 2019

AmitMY commented May 3, 2019

pedrocolon93 commented May 7, 2019

AmitMY commented Apr 29, 2019 •

edited

Loading

pedrocolon93 commented Apr 30, 2019 •

edited

Loading

pedrocolon93 commented Apr 30, 2019 •

edited

Loading