An outline of how I plan to compute the `V` and `U` rank-k matrices for the linear transformation of `h` #2

jukofyork · 2024-08-25T14:47:21Z

jukofyork
Aug 25, 2024
Maintainer

So recall from the other post that we are going to try to transform h like so:

h = h + h*V^T*U

and assuming we have two data matrices generated in the same way as the current control vector code:

A = an m x n matrix of samples for the negative prompts.
B = an m x n matrix of samples for the positive prompts.

we can set up this system:

A*X = B

where X is an n x n square matrix which is the transformation that takes A onto B.

If A and B were also n x n and full-rank, we could calculate:

A*X = B

A^-1*A*X = A^-1*B

I*X = A^-1*B

X = A^-1*B

So now we have a square matrix X that when applied to A transforms it into B (ie: it takes all our "negative prompts" to the equivalent "positive prompts" [and likely buggers up all the "positive prompts" and everything else in the process!]).

BUT: since A and B are rectangular we need to solve in terms of Least Squares via the Moore–Penrose Inverse:

A*X = B

A^+*A*X = A^+*B

X = A^+*B

(or via direct solution of the least squares problem...)

So now we have a square matrix X that when applied to A somewhat (but not exactly) transforms it into B (and also likely buggers up everything else in the process still!)

So now if we take the SVD of X we get:

X = U*S*V^T

(NOTE: X is not symmetric so a Eigendecomposition will end up giving complex vectors...)

and truncate to just use the top k singular vectors/values, and also distribute the sqrt(S_i) between the corresponding u_i and v_i vectors, we should hopefully get the U and V rank-k matrices we want.

NOTE: By truncating the SVD like this we are in effect performing a form of Spectral Regularization, and it may well turn out that the singular values drop off very quickly (as was the case for the control vectors).

In practice the "baseline" data-matrix will be getting subtracted from A and B in the same way as we did for the control vectors, and clearly this is going to need some kind of careful regularisation to work in practice, but I think it should be a good starting point to try.

jukofyork · 2024-08-25T15:06:29Z

jukofyork
Aug 25, 2024
Maintainer Author

It may also require downscaling the U and V matrices to use less than the full amount even after the TSVD (ie: similar to applying a learning rate) and then possibly repeatedly perform the same algorithm and create an Additive Modell similar to Gradient Boosting.

If this still doesn't work, or turns out to be too brutal and destroys the models' outputs; it should be possible to train U and V online via a contrastive learning loss similar to Triplet Loss, but this would require some kind of inference-time hook to be able to apply the partially optimised U and V matrices and likely a PITA to implement...

0 replies

jukofyork · 2024-08-25T15:21:44Z

jukofyork
Aug 25, 2024
Maintainer Author

There also looks to be some interesting implications from there actually being a definite additive "bias" in some of the control vector axis:

https://huggingface.co/jukofyork/creative-writing-control-vectors-BETA-v0.1/discussions/2#66ca1f7901fbc62ecd1746e0

This might open up the possibility of trying to find U and V such that they cancel out this additive "bias" only (possibly by calculating two different Moore–Penrose Inverse matrices to take A --> B and B --> A and using these together to balance the expected value of the outputs so as to be equidistant from the "baseline" sample, or similar).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

An outline of how I plan to compute the `V` and `U` rank-k matrices for the linear transformation of `h` #2

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

An outline of how I plan to compute the V and U rank-k matrices for the linear transformation of h #2

jukofyork Aug 25, 2024 Maintainer

Replies: 2 comments

jukofyork Aug 25, 2024 Maintainer Author

jukofyork Aug 25, 2024 Maintainer Author

An outline of how I plan to compute the `V` and `U` rank-k matrices for the linear transformation of `h` #2

jukofyork
Aug 25, 2024
Maintainer

jukofyork
Aug 25, 2024
Maintainer Author

jukofyork
Aug 25, 2024
Maintainer Author