-
Notifications
You must be signed in to change notification settings - Fork 3
Blockwise OP (ALS) on task graph
Alternating Least Square is a block-wise optimization, the common method for matrix factorization, and the ability to handle large scale ALS. It can be a good test for any ML platform. Matrix factorization is a bit more of under served area, so implement this has more of practical value.
Assuming that we have MxN observation matrix A, and we want to approximate with the product of two matrix D*T, where D,T is of dimension MxK and KxN. The general idea is then simply fixing N while find better M then fixing M while find better N.
To describe it under topic modeling framework, A is then document by row, and word by column, D is mixing proportions, and T is topic model. There can be many different loss functions, and different conditions, we will only assume that we can compute gradient on loss function.
To scale the factorization, we need to keep two copies of the A, one sharded by row (document), another sharded by column (word). This basically means that we will have two types of tasks in the task graph. Row task is used to hold document shard, or row slices for both A and D, and column task is used to hold word shard, or column slices for both A and T. Hopefully by now, it is clear that we can do optimization for D and T, locally in row task and column task respectively. One also note that we might not want to have these two tasks on different machine as it would result in half of machine are idle. Depending on how data are distributed across shards one might be able to use combined task do row task in even epoch and column task in odd epoch. By either way, we need to address the parameter sharing.
Take row task as example, at beginning of its work, it need potentially all of T shards. The easiest thing to do is simply have a fully connect bipartite going from T shards to row task and I think that is what we should try. We do not even need broadcast variable or accumulators in this case.