4.3 Training a Multilayer Neural Network in PyTorch (PART 1-5) #105
Replies: 1 comment 1 reply
-
That's a good question and observation. From a conceptual perspective, it makes sense to show the softmax function. It's also necessary for optimization. However if you only need the class label, then applying argmax to the logits directly is computationally more efficient. That's because softmax is a monotonic function, that means, the largest logits value is also the largest softmax. Long story short, since we don't need to compute gradients in the accuracy function, we can omit the softmax to save some computation time like you suggested. You have a great eye for detail! Let me know if you have any follow-up questions! |
Beta Was this translation helpful? Give feedback.
-
The computation graph for Softmax regression as explained in Unit 4.2 is as below.
u = X. WeightT -> z = u + bias -> activation = softmax(z) -> argmax(activation) -> L= cross_entropy(z,output_label).
In definition of compute_accuracy , the argmax is directly applied to net inputs(logits). Shouldn't the net inputs first passed through softmax and then through argmax.
I understand , it will not change the outcome , since argmax will return us the index of max probability. So is the softmax activation excluded from computation step to save on compute cycles or is there any other reason of not including it?
Please help me in understanding the discrepancy between the computation graph and the implementation of compute_accuracy function
Beta Was this translation helpful? Give feedback.
All reactions