Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Managing initilizations via a role scheme dictionary #1030

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

memimo
Copy link
Contributor

@memimo memimo commented Mar 15, 2016

fixes #970
Re-attempt of #1021
So far it only changes simple.py will propagate changes to other bricks when the format is approved.

if key in self.initialization_schemes:
raise ValueError("All initializations are accepted either"
"through initialization_schemes or "
"correspodong attribute but not both")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also s/attribute/keyword argument/

@dwf
Copy link
Contributor

dwf commented Mar 15, 2016

Technically this is @rizar's review but hopefully my comments from the peanut gallery are a little helpful to at least one of you 😛

@memimo
Copy link
Contributor Author

memimo commented Apr 5, 2016

@rizar could you please take a look at the high level structure of the PR. I'll take care of the comments of dwf when the high-level is approved.

@rizar
Copy link
Contributor

rizar commented Apr 5, 2016

Sure! Will work on it today.

@rizar
Copy link
Contributor

rizar commented Apr 6, 2016

Sorry for the delay, I am struggling with the cluster and my course project. I will definitely review it tomorrow.

@rizar
Copy link
Contributor

rizar commented Apr 7, 2016

I have three suggestions:

  • use the roles as the keys in initialization_schemes
  • initialize parameters using the scheme given for their most specific role
  • check that the given initialization schemes match the parameter roles of the brick in two places: in the end of constructor and in _push_initialization_config. This will allow to detect errors as early as possible.

@memimo
Copy link
Contributor Author

memimo commented Apr 12, 2016

I reorganized the PR a little bit. Again et's discuss the generality of the PR first, I will fix formatting stuff later.

Some notes:

1-The problem with having the roles as keys is that, roles are objects, and different objects of same class will be hashed to different values. So instead I use the name of the roles as the keys.
2-One major API change is that, if someone wants to manually change the schemes self.weights_init and self.biases_init is not valid anymore and one has to use self.initializations_schems
3-There is a tiny bit difference in the GRU weight initialization values now. I don't believe that will have any practical impact on performance, unless someone has an evidence against it.

@memimo memimo force-pushed the init branch 2 times, most recently from 22b5a09 to faf0d27 Compare April 12, 2016 19:27
@rizar
Copy link
Contributor

rizar commented Apr 13, 2016

  1. I think you gave up the idea of a role-indexed dictionary too early. Check https://docs.python.org/2/reference/datamodel.html#object.__hash__ , you should be able to overload this method.
  2. I thought we would support weights_init and biases_init through __getattr__, why not?

@dwf
Copy link
Contributor

dwf commented Apr 13, 2016

Indeed,

def __hash__(self):
    return hash(str(self))

should work in the case of roles.

if self.initialization_schemes is None:
self.initialization_schemes = {}

initialization_to_role = {"weights_init": WEIGHT, 'biases_init': BIAS,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, it looks like you have chosen to only handle a handful of <role>_init keyword arguments, not all of them. This is probably good, because this clearly makes initialization_schemes the primary way of defining the initialization schemes.

But the code clearly lacks a check if key not in initialization_to_role. Since now we have a clearly defined white list, I would probably raise a warning, that calling the arguments smth_init is recommended, but still pass this argument to the super().__init__ call.

@rizar
Copy link
Contributor

rizar commented Apr 25, 2016

I think the major design decision that I disagree with is having both initializable_roles and parameter_roles. Neither of the two is fully suitable for controlling how the initialization schemes are spread down the hierarchy. The former is hardcoded, and the latter is known too late, after the parameters are created.

Here is a different design which seems better to me

  • each Initilizable brick has an attribute parameter_roles that contains all the roles that the brick's parameters will have when constructed
  • the value of this attribute is defined as follows:
    • first, Initializable sets it to [WEIGHT, BIAS] (only [WEIGHT] is not use_bias)
    • ... unless the user provides parameter_roles as a constructor argument
    • then, after super(Initializable, self).__init__, Initializable._collect_roles is called to add
      the roles of the children to parameter_roles
  • we are still in Initializable.__init__, but we can already check that initialization_schemes does not contain schemes for the roles that are not in parameter_roles
  • furthermore, when the parameters are created (that is in allocate), we can check that all the parameters have roles that are present in parameter_roles
  • when initialization configuration is pushed (Initializable._push_initialization_config), only the schemes for the relevant roles should be pushed to the children. One should be careful here to check that sometimes initialization schemes for roles that are parents of the precise parameter roles should also be pushed.
  • in initialize, we can repeat the sanity check that we did in constructor, to make sure that user did not provide useless initialization schemes

What do you think?

self.parameter_roles.update(set([role]))

def _initialize(self):
for param in self.parameters:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a dangerous code. First, if a parameter has more than one role, it is not clear at all how it should be initialized. I would rather raise an exception upon encountering such a situation. Also, we should choose the most specific initialization scheme (e.g. the one for RECURRENT_WEIGHT and not for just WEIGHT if the parameter is RECURRENT_WEIGHT).

@rizar
Copy link
Contributor

rizar commented May 23, 2016

@memimo , we should resume working on this, and I really hope we can finish it until I leave Montreal on June 8.

@memimo memimo force-pushed the init branch 3 times, most recently from 893a301 to 45d4456 Compare June 1, 2016 17:57
@memimo memimo force-pushed the init branch 2 times, most recently from e7763cb to bc87d5a Compare June 1, 2016 18:17
@memimo
Copy link
Contributor Author

memimo commented Jun 1, 2016

@rizar could you take a look about the high level design choice?
Travis is falling on block_examples code, because some initilization values have changed. I will take care of that later, if the design is good.

if self.initialization_schemes is None:
self.initialization_schemes = {}

if parameter_roles:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hold on, haven't we decided to not have parameter_roles and instead collect roles the actual parameter variables?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it was the other way :p I guess it was a confusion. I'll give it a try, the other way around

@memimo
Copy link
Contributor Author

memimo commented Jun 2, 2016

@rizar ok, so this version collect roles from parameter tags. Will rebase later on.


super(Initializable, self).__init__(**kwargs)

def _validate_roles(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess here you propagate initialization schemes does the hierarchy of roles. But it seems to me that you do not choose the most specific role out of those for which an initialization scheme is available. E.g. if we have an init. scheme for PARAMETER and for WEIGHT, and now we are looking for an init. scheme for RECURRENT_WEIGHT, then we should always choose the scheme given for WEIGHT. In the code
below it seems like the choice will depend on the order of the dictionary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I find it a bit weird that you change initialization_schemes. This might make debugging quite a bit hard. I would rather put the search for the most specific init. scheme directly in initialize.

@memimo
Copy link
Contributor Author

memimo commented Jun 6, 2016

@rizar I updated based on your latest comments.

self.parameter_roles.update(child.parameter_roles)

def _initialize(self):
def get_scheme(role, schemes):
Copy link
Contributor

@rizar rizar Jun 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not seem to me that this code will choose the most specific rolefor which there is an intialization scheme.

Here roughly what I think should work:

def get_scheme(role, schemes):
   candidates = []
   for key in schemes:
       if isinstance(role, key):
           if all([isinstance(key, c) for c in candidates]):
               candidates = [c]
           else:
               candidates.append(c)
   if not candidates:
        raise ValueError("No init scheme")
   if len(candidates) > 1:
        raise ValueError("Multiple candidate init schemes")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure your code snippet do that either. Let's assume we have this class hierarchy A->B->C
a = A(), b=B(), c=C()
Then:
isinstance(c, A) -- True
type(c) == A -- False

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but all I am using is "isinstance", so what's wrong?

On 6 June 2016 at 16:44, Mehdi Mirza [email protected] wrote:

In blocks/bricks/interfaces.py
#1030 (comment):

  •        all_roles = []
    
  •        for param in obj.parameters:
    
  •            roles = param.tag.roles
    
  •            # TODO do something smarter
    
  •            if len(roles) > 0:
    
  •                all_roles.append(roles[0])
    
  •        return all_roles
    
  •    self.parameter_roles = set(get_param_roles(self))
    
  •    for child in self.children:
    
  •        if isinstance(child, Initializable):
    
  •            child._collect_roles()
    
  •            self.parameter_roles.update(child.parameter_roles)
    
  • def _initialize(self):
  •    def get_scheme(role, schemes):
    

I'm not sure your code snippet do that either. Let's assume we have this
class hierarchy A->B->C
a = A(), b=B(), c=C()
Then:
isinstance(c, A) -- True
type(c) == A -- False


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/mila-udem/blocks/pull/1030/files/f3fca5343847c265abe7b7c2424d585485476b76#r65965285,
or mute the thread
https://github.com/notifications/unsubscribe/AAn8Yul0FN9mBXfoNDAvqFDqOW5JOviZks5qJIbJgaJpZM4HxS-w
.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're not going up the hierarchy. Let's say if we have in schemes both C and B. And ideally we should choose the scheme associated with C not B, your code does not do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Metaissue: Improve initialization
3 participants