Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clearing up my confusion on symbols in Chapter 1 and Appendix A. #2

Open
matmaq1 opened this issue Nov 25, 2019 · 0 comments
Open

Clearing up my confusion on symbols in Chapter 1 and Appendix A. #2

matmaq1 opened this issue Nov 25, 2019 · 0 comments

Comments

@matmaq1
Copy link

matmaq1 commented Nov 25, 2019

I had a hard time wrapping my head around the Matrix Chain rule in both the digital and printed version, and there is a couple of typos or at least inconsistencies (I think, I'm not an expert, but it just didn't click for me) in both.

The whole crucade started because I was more or less getting the core idea, but couldn't figure out a) why the same process left weights in a different spot in a calculation and b) how the heck did the W(t) came about. The more I read through the appendix the less I understood.

I did go through p.221-224 multiple times but, frankly, having forgotten most stuff about derivatives excluding the most basic of stuff and adding inconsistent symbols in couple of places and to my untrained eye a couple of typos here and there made it not only hard but also frustrating.

Specifically on p. 221 there was no information on what is actually calculated (wouldn't hurt to remind that, as I learn visually) and on p. 223 we were calculating dLdX(S) that had to be guessed to be equal (or literally be) dLdX(X) page later (what is the difference or if there is no difference why use two different symbols)? Earlier, in chapter 1 it was reinforced that dLdu(S) is a matrix of ones but then on p.224 it was a whole other matrix (that I get where it came from, but the transition could be underlined). I even tried going through the digital version and it wasn't better - not even talking about the zoom issue, but things like dLdu(N)=dLdu(N) (why?) or later, again, that dLdX(S) is this big matrix and then later it is apparently equivalent to dLdX(X) which is also equivalent to dLdu(S) that is also (???) equivalent to dLdu(S) multiplied by W(t). The last point could also be a little clearer.

I get that this is trivial to you, but I'm trying to learn after a fairly long break from advanced math in general and if it really is supposed to be 'from scratch' where it is underlined how important it is to grasp the concepts (and I guess how gradients are actually calculated is pretty important) it would be beneficial to have the math extra clear so everybody plays on even playing field. Somebody that was hearing about deep learning and just took the book off the shelf to give it a go could be really put off and maybe discouraged to the whole idea by something like this despite reading dilligently. I hope I didn't come off as venting - I just really want to have a good grasp on this subject so I can feel confident about coding myself later as I understand the background. Well, back to reading; maybe the coding part will clear that up for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant