You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I had a hard time wrapping my head around the Matrix Chain rule in both the digital and printed version, and there is a couple of typos or at least inconsistencies (I think, I'm not an expert, but it just didn't click for me) in both.
The whole crucade started because I was more or less getting the core idea, but couldn't figure out a) why the same process left weights in a different spot in a calculation and b) how the heck did the W(t) came about. The more I read through the appendix the less I understood.
I did go through p.221-224 multiple times but, frankly, having forgotten most stuff about derivatives excluding the most basic of stuff and adding inconsistent symbols in couple of places and to my untrained eye a couple of typos here and there made it not only hard but also frustrating.
Specifically on p. 221 there was no information on what is actually calculated (wouldn't hurt to remind that, as I learn visually) and on p. 223 we were calculating dLdX(S) that had to be guessed to be equal (or literally be) dLdX(X) page later (what is the difference or if there is no difference why use two different symbols)? Earlier, in chapter 1 it was reinforced that dLdu(S) is a matrix of ones but then on p.224 it was a whole other matrix (that I get where it came from, but the transition could be underlined). I even tried going through the digital version and it wasn't better - not even talking about the zoom issue, but things like dLdu(N)=dLdu(N) (why?) or later, again, that dLdX(S) is this big matrix and then later it is apparently equivalent to dLdX(X) which is also equivalent to dLdu(S) that is also (???) equivalent to dLdu(S) multiplied by W(t). The last point could also be a little clearer.
I get that this is trivial to you, but I'm trying to learn after a fairly long break from advanced math in general and if it really is supposed to be 'from scratch' where it is underlined how important it is to grasp the concepts (and I guess how gradients are actually calculated is pretty important) it would be beneficial to have the math extra clear so everybody plays on even playing field. Somebody that was hearing about deep learning and just took the book off the shelf to give it a go could be really put off and maybe discouraged to the whole idea by something like this despite reading dilligently. I hope I didn't come off as venting - I just really want to have a good grasp on this subject so I can feel confident about coding myself later as I understand the background. Well, back to reading; maybe the coding part will clear that up for me.
The text was updated successfully, but these errors were encountered:
I had a hard time wrapping my head around the Matrix Chain rule in both the digital and printed version, and there is a couple of typos or at least inconsistencies (I think, I'm not an expert, but it just didn't click for me) in both.
The whole crucade started because I was more or less getting the core idea, but couldn't figure out a) why the same process left weights in a different spot in a calculation and b) how the heck did the W(t) came about. The more I read through the appendix the less I understood.
I did go through p.221-224 multiple times but, frankly, having forgotten most stuff about derivatives excluding the most basic of stuff and adding inconsistent symbols in couple of places and to my untrained eye a couple of typos here and there made it not only hard but also frustrating.
Specifically on p. 221 there was no information on what is actually calculated (wouldn't hurt to remind that, as I learn visually) and on p. 223 we were calculating dLdX(S) that had to be guessed to be equal (or literally be) dLdX(X) page later (what is the difference or if there is no difference why use two different symbols)? Earlier, in chapter 1 it was reinforced that dLdu(S) is a matrix of ones but then on p.224 it was a whole other matrix (that I get where it came from, but the transition could be underlined). I even tried going through the digital version and it wasn't better - not even talking about the zoom issue, but things like dLdu(N)=dLdu(N) (why?) or later, again, that dLdX(S) is this big matrix and then later it is apparently equivalent to dLdX(X) which is also equivalent to dLdu(S) that is also (???) equivalent to dLdu(S) multiplied by W(t). The last point could also be a little clearer.
I get that this is trivial to you, but I'm trying to learn after a fairly long break from advanced math in general and if it really is supposed to be 'from scratch' where it is underlined how important it is to grasp the concepts (and I guess how gradients are actually calculated is pretty important) it would be beneficial to have the math extra clear so everybody plays on even playing field. Somebody that was hearing about deep learning and just took the book off the shelf to give it a go could be really put off and maybe discouraged to the whole idea by something like this despite reading dilligently. I hope I didn't come off as venting - I just really want to have a good grasp on this subject so I can feel confident about coding myself later as I understand the background. Well, back to reading; maybe the coding part will clear that up for me.
The text was updated successfully, but these errors were encountered: