4.3 Training a Multilayer Neural Network in PyTorch (PART 1-5) #105

bikash119 · 2024-03-17T02:38:14Z

bikash119
Mar 17, 2024

The computation graph for Softmax regression as explained in Unit 4.2 is as below.
u = X. Weight^T -> z = u + bias -> activation = softmax(z) -> argmax(activation) -> L= cross_entropy(z,output_label).

In definition of compute_accuracy , the argmax is directly applied to net inputs(logits). Shouldn't the net inputs first passed through softmax and then through argmax.

def compute_accuracy(model, dataloader):
    model = model.eval()    
    correct = 0.0
    total_examples = 0
    for idx, (features, labels) in enumerate(dataloader):
        with torch.inference_mode(): # basically the same as torch.no_grad
            logits = model(features)
        predictions = torch.argmax(logits, dim=1)
        compare = labels == predictions
        correct += torch.sum(compare)
        total_examples += len(compare)

    return correct / total_examples

I understand , it will not change the outcome , since argmax will return us the index of max probability. So is the softmax activation excluded from computation step to save on compute cycles or is there any other reason of not including it?
Please help me in understanding the discrepancy between the computation graph and the implementation of compute_accuracy function

rasbt · 2024-03-28T15:56:27Z

rasbt
Mar 28, 2024
Maintainer

In definition of compute_accuracy , the argmax is directly applied to net inputs(logits). Shouldn't the net inputs first passed through softmax and then through argmax.

That's a good question and observation. From a conceptual perspective, it makes sense to show the softmax function. It's also necessary for optimization. However if you only need the class label, then applying argmax to the logits directly is computationally more efficient. That's because softmax is a monotonic function, that means, the largest logits value is also the largest softmax. Long story short, since we don't need to compute gradients in the accuracy function, we can omit the softmax to save some computation time like you suggested. You have a great eye for detail! Let me know if you have any follow-up questions!

1 reply

bikash119 Apr 1, 2024
Author

Thank you @rasbt for helping me on this. You are amazingly helpful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

4.3 Training a Multilayer Neural Network in PyTorch (PART 1-5) #105

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

4.3 Training a Multilayer Neural Network in PyTorch (PART 1-5) #105

bikash119 Mar 17, 2024

Replies: 1 comment · 1 reply

rasbt Mar 28, 2024 Maintainer

bikash119 Apr 1, 2024 Author

bikash119
Mar 17, 2024

Replies: 1 comment 1 reply

rasbt
Mar 28, 2024
Maintainer

bikash119 Apr 1, 2024
Author