Replies: 2 comments
-
It may also require downscaling the If this still doesn't work, or turns out to be too brutal and destroys the models' outputs; it should be possible to train |
Beta Was this translation helpful? Give feedback.
-
There also looks to be some interesting implications from there actually being a definite additive "bias" in some of the control vector axis: This might open up the possibility of trying to find |
Beta Was this translation helpful? Give feedback.
-
So recall from the other post that we are going to try to transform
h
like so:h = h + h*V^T*U
and assuming we have two data matrices generated in the same way as the current control vector code:
A
= anm
xn
matrix of samples for the negative prompts.B
= anm
xn
matrix of samples for the positive prompts.we can set up this system:
A*X = B
where
X
is ann x n
square matrix which is the transformation that takesA
ontoB
.If
A
andB
were alson x n
and full-rank, we could calculate:A*X = B
A^-1*A*X = A^-1*B
I*X = A^-1*B
X = A^-1*B
So now we have a square matrix X that when applied to
A
transforms it intoB
(ie: it takes all our "negative prompts" to the equivalent "positive prompts" [and likely buggers up all the "positive prompts" and everything else in the process!]).BUT: since
A
andB
are rectangular we need to solve in terms of Least Squares via the Moore–Penrose Inverse:A*X = B
A^+*A*X = A^+*B
X = A^+*B
(or via direct solution of the least squares problem...)
So now we have a square matrix
X
that when applied toA
somewhat (but not exactly) transforms it intoB
(and also likely buggers up everything else in the process still!)So now if we take the SVD of
X
we get:X = U*S*V^T
(NOTE: X is not symmetric so a Eigendecomposition will end up giving complex vectors...)
and truncate to just use the top
k
singular vectors/values, and also distribute thesqrt(S_i)
between the correspondingu_i
andv_i
vectors, we should hopefully get theU
andV
rank-k matrices we want.NOTE: By truncating the SVD like this we are in effect performing a form of Spectral Regularization, and it may well turn out that the singular values drop off very quickly (as was the case for the control vectors).
In practice the "baseline" data-matrix will be getting subtracted from
A
andB
in the same way as we did for the control vectors, and clearly this is going to need some kind of careful regularisation to work in practice, but I think it should be a good starting point to try.Beta Was this translation helpful? Give feedback.
All reactions