Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Couldn't get the model_summary for slowfast_r50 model #730

Open
RenurajYennawar opened this issue Sep 26, 2024 · 0 comments
Open

Couldn't get the model_summary for slowfast_r50 model #730

RenurajYennawar opened this issue Sep 26, 2024 · 0 comments

Comments

@RenurajYennawar
Copy link

In the below script, I tried to print the model summary for slowfast_r50 model in which I am getting correctly the model architecture
import torch
import json
from torchsummary import summary
from torchvision.transforms import Compose, Lambda
from torchvision.transforms._transforms_video import (
CenterCropVideo,
NormalizeVideo,
)
from pytorchvideo.data.encoded_video import EncodedVideo
from pytorchvideo.transforms import (
ApplyTransformToKey,
ShortSideScale,
UniformTemporalSubsample,
UniformCropVideo
)
from typing import Dict

Device on which to run the model

Set to cuda to load on GPU

device = "cpu"

Pick a pretrained model and load the pretrained weights

model_name = "slowfast_r50"
model = torch.hub.load("facebookresearch/pytorchvideo", model=model_name, pretrained=True)
print(model)
slow_input_size=[1,3,8,256,256]
fast_input_size=[1,3,32,256,256]

input_size=[slow_input_size,fast_input_size]

model_summary=summary(model,input_size)

print(model_summary)

Save the entire model (including the architecture)

model_save=torch.save(model, "slowfast_r50_full_model.pth")
#print(model_save)

Set to eval mode and move to desired device

model = model.to(device)
model = model.eval()
with open("/home/mantra/Documents/Projects/Video/pytorchvideo_tutorial/kinetics_classnames.json", "r") as f:
kinetics_classnames = json.load(f)

Create an id to label name mapping

kinetics_id_to_classname = {}
for k, v in kinetics_classnames.items():
kinetics_id_to_classname[v] = str(k).replace('"', "")

side_size = 256
mean = [0.45, 0.45, 0.45]
std = [0.225, 0.225, 0.225]
crop_size = 256
num_frames = 32
sampling_rate = 2
frames_per_second = 30
alpha = 4

class PackPathway(torch.nn.Module):
"""
Transform for converting video frames as a list of tensors.
"""
def init(self):
super().init()

def forward(self, frames: torch.Tensor):
    fast_pathway = frames
    # Perform temporal sampling from the fast pathway.
    slow_pathway = torch.index_select(
        frames,
        1,
        torch.linspace(
            0, frames.shape[1] - 1, frames.shape[1] // alpha
        ).long(),
    )
    frame_list = [slow_pathway, fast_pathway]
    return frame_list

transform = ApplyTransformToKey(
key="video",
transform=Compose(
[
UniformTemporalSubsample(num_frames),
Lambda(lambda x: x/255.0),
NormalizeVideo(mean, std),
ShortSideScale(
size=side_size
),
CenterCropVideo(crop_size),
PackPathway()
]
),
)

The duration of the input clip is also specific to the model.

clip_duration = (num_frames * sampling_rate)/frames_per_second
print(clip_duration)

Load the example video

video_path = "/home/mantra/Documents/Projects/Video/pytorchvideo_tutorial/-1HT31BzADs_000118_000128.mp4"

Select the duration of the clip to load by specifying the start and end duration

The start_sec should correspond to where the action occurs in the video

start_sec = 0
end_sec = start_sec + clip_duration

Initialize an EncodedVideo helper class

video = EncodedVideo.from_path(video_path)

Load the desired clip

video_data = video.get_clip(start_sec=start_sec, end_sec=end_sec)
print('video_data')
print(type(video_data['video']))
print(video_data['video'].shape)

print(type(video_data['audio']))
print(video_data['audio'].shape)

print('video data keys')
for key, value in video_data.items() :
print(key)

Apply a transform to normalize the video input

video_data = transform(video_data)
print(video_data)
print(video_data['video'][0])

Move the inputs to the desired device

inputs = video_data["video"]
inputs = [i.to(device)[None, ...] for i in inputs]
print(inputs[0].shape)
print(inputs[1].shape)

Pass the input clip through the model

preds = model(inputs)

torch.Size([1, 3, 8, 256, 256])

torch.Size([1, 3, 32, 256, 256])

model_summary=summary(model,(1, 3, 8, 256, 256))

model_summary=summary(model,(([[1, 3, 8, 256, 256],[1, 3, 32, 256, 256]])))
print(model_summary)

but here while printing the model_summary, I am getting the following error;
Traceback (most recent call last):
File "/home/mantra/Documents/Projects/Video/pytorchvideo_tutorial/slowfast_arch.py", line 14, in
summary(model, input_sizes )
File "/home/mantra/miniconda3/envs/video/lib/python3.12/site-packages/torchsummary/torchsummary.py", line 72, in summary
model(*x)
File "/home/mantra/miniconda3/envs/video/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mantra/miniconda3/envs/video/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: Net.forward() takes 2 positional arguments but 3 were given
I tried to resolve this error by multiple possible ways but couldn't get the expected output. please can anyone help me out in resolving this;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant