Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TartanVO Training - Uncertain Problems in Stage Two #21

Open
Zilong-L opened this issue Aug 7, 2024 · 8 comments
Open

TartanVO Training - Uncertain Problems in Stage Two #21

Zilong-L opened this issue Aug 7, 2024 · 8 comments

Comments

@Zilong-L
Copy link

Zilong-L commented Aug 7, 2024

Hi,
First of all, I really appreciate the incredible work you've done on TartanVO. It's a fantastic contribution to the field.

I've been working on replicating the training code for TartanVO, and I'm encountering some issues. In the first phase, where the goal is to predict pose from groundtruth (GT) data, the performance is acceptable (I assume). Even though the result is not as good as the 1914.pkl file provided by the authors, the loss decreases normally (I assume) throughout the training process.
image

However, in the second phase, where the PWC-Net and FlowPoseNet are connected, I'm facing difficulties and I'm not sure where the problem lies. The loss is not decreasing as far as I can tell. I would greatly appreciate any assistance or insights you provide.

Current Results

Training result for pose only(stage one) training

image

Training result for stage two.

I don't see a decrese.
image

Details of my setup

Loss Calculation Method:

These are what I found in a COMPASS repo linked by another issue #14.

  1. Total loss
flow_loss = ddp_model.module.flowNet.get_loss(flow,flow_gt,small_scale=True)
pose_loss,trans_loss,rot_loss = ddp_model.module.flowPoseNet.linear_norm_trans_loss(relative_motion, motions_gt)
total_loss = flow_loss*lambda_flow + pose_loss
  1. Pose loss
    def linear_norm_trans_loss(self, output, motion, mask=None):
        output_trans = output[:, :3]
        output_rot = output[:, 3:]

        trans_norm = torch.norm(output_trans, dim=1).view(-1, 1)
        output_norm = output_trans/trans_norm

        if mask is None:
            trans_loss = self.criterion(output_norm, motion[:, :3])
            rot_loss = self.criterion(output_rot, motion[:, 3:])
        else:
            trans_loss = self.criterion(output_norm[mask,:], motion[mask, :3])
            rot_loss = self.criterion(output_rot[mask,:], motion[mask, 3:])

        loss = (rot_loss + trans_loss)/2.0

        return loss, trans_loss.item() , rot_loss.item()
  1. Flow Loss
    def get_loss(self, output, target,  small_scale=False):
        '''
        return flow loss
        '''
        criterion = self.criterion
        if self.training:
            target4, target8, target16, target32, target64 = self.scale_targetflow(target, small_scale)
            loss1 = criterion(output[0], target4)
            loss2 = criterion(output[1], target8)
            loss3 = criterion(output[2], target16)
            loss4 = criterion(output[3], target32)
            loss5 = criterion(output[4], target64)
            loss = (loss1 + loss2 + loss3 + loss4 + loss5)/5.0
        else:
            if small_scale:
                output4 = output[0]
            else:
                output4 = F.interpolate(output[0], scale_factor=4, mode='bilinear', align_corners=True)# /4.0
            loss = criterion(output4, target)
        return loss

Dataset Preparation.

Other than transformations, I use the basically the same configuration in this repo. and again this one is what I get from the COMPASS repo.

transform = Compose([CropCenter((height,width)), RandomResizeCrop(size=(448, 640)),DownscaleFlow(),ToTensor()])
class RandomResizeCrop(object):
    """
    Random scale to cover continuous focal length
    Due to the tartanair focal is already small, we only up scale the image

    """

    def __init__(self, size, max_scale=2.5, keep_center=False, fix_ratio=False, scale_disp=False):
        '''
        size: output frame size, this should be NO LARGER than the input frame size! 
        scale_disp: when training the stereovo, disparity represents depth, which is not scaled with resize 
        '''
        if isinstance(size, numbers.Number):
            self.target_h = int(size)
            self.target_w = int(size)
        else:
            self.target_h = size[0]
            self.target_w = size[1]

        # self.max_focal = max_focal
        self.keep_center = keep_center
        self.fix_ratio = fix_ratio
        self.scale_disp = scale_disp
        # self.tartan_focal = 320.

        # assert self.max_focal >= self.tartan_focal
        self.scale_base = max_scale #self.max_focal /self.tartan_focal

    def __call__(self, sample): 
        for kk in sample:
            if len(sample[kk].shape)>=2:
                h, w = sample[kk].shape[0], sample[kk].shape[1]
                break
        self.target_h = min(self.target_h, h)
        self.target_w = min(self.target_w, w)

        scale_w, scale_h, x1, y1, crop_w, crop_h = generate_random_scale_crop(h, w, self.target_h, self.target_w, 
                                                    self.scale_base, self.keep_center, self.fix_ratio)

        for kk in sample:
            # if kk in ['flow', 'flow2', 'img0', 'img0n', 'img1', 'img1n', 'intrinsic', 'fmask', 'disp0', 'disp1', 'disp0n', 'disp1n']:
            if len(sample[kk].shape)>=2 or kk in ['fmask', 'fmask2']:
                sample[kk] = sample[kk][y1:y1+crop_h, x1:x1+crop_w]
                sample[kk] = cv2.resize(sample[kk], (0,0), fx=scale_w, fy=scale_h, interpolation=cv2.INTER_LINEAR)
                # Note opencv reduces the last dimention if it is one
                sample[kk] = sample[kk][:self.target_h,:self.target_w]

        # scale the flow
        if 'flow' in sample:
            sample['flow'][:,:,0] = sample['flow'][:,:,0] * scale_w
            sample['flow'][:,:,1] = sample['flow'][:,:,1] * scale_h
        # scale the flow
        if 'flow2' in sample:
            sample['flow2'][:,:,0] = sample['flow2'][:,:,0] * scale_w
            sample['flow2'][:,:,1] = sample['flow2'][:,:,1] * scale_h

        if self.scale_disp: # scale the depth
            if 'disp0' in sample:
                sample['disp0'][:,:] = sample['disp0'][:,:] * scale_w
            if 'disp1' in sample:
                sample['disp1'][:,:] = sample['disp1'][:,:] * scale_w
            if 'disp0n' in sample:
                sample['disp0n'][:,:] = sample['disp0n'][:,:] * scale_w
            if 'disp1n' in sample:
                sample['disp1n'][:,:] = sample['disp1n'][:,:] * scale_w
        else:
            sample['scale_w'] = np.array([scale_w ])# used in e2e-stereo-vo

        return sample

Some Hyperparameters

learning_rate = 0.0001
lambda_flow = 1
scheduler setup:
def lr_lambda(iteration):
    if iteration < 0.5 * total_iterations:
        return 1.0
    elif iteration < 0.875 * total_iterations:
        return 0.2
    else:
        return 0.04

Main training logics

    while iteration < total_iterations:
        for sample in train_dataloader:
            ddp_model.train()
            optimizer.zero_grad()  # Zero the parameter gradients
            if iteration >= total_iterations:
                print(f"Successfully completed training for {iteration} iterations")
                break
            
            total_loss,flow_loss,pose_loss,trans_loss,rot_loss = process_whole_sample(ddp_model,sample,lambda_flow,device_id)
            # backpropagation----------------------------------------------------------
            total_loss.backward()
            optimizer.step()
            scheduler.step()

            iteration += 1
def process_whole_sample(ddp_model,sample,lambda_flow,device_id):
    sample = {k: v.to(device_id) for k, v in sample.items()} 
    # inputs-------------------------------------------------------------------
    img1 = sample['img1']
    img2 = sample['img2']
    intrinsic_layer = sample['intrinsic']
        
    # forward------------------------------------------------------------------
    flow, relative_motion = ddp_model([img1,img2,intrinsic_layer])


    # loss calculation---------------------------------------------------------
    flow_gt = sample['flow']
    motions_gt = sample['motion']
    flow_loss = ddp_model.module.flowNet.get_loss(flow,flow_gt,small_scale=True)
    pose_loss,trans_loss,rot_loss = ddp_model.module.flowPoseNet.linear_norm_trans_loss(relative_motion, motions_gt)
    total_loss = flow_loss*lambda_flow + pose_loss
    
    return total_loss,flow_loss,pose_loss,trans_loss,rot_loss

Is there any other information I should provide to help diagnose the issue? Thank you in advance for your help!

I apologize for the long post, but I'm at a point where I can't make further progress and really hope to get support from the community. My humble implementation is available here.

A million thank you,

Best regards.

@Zilong-L Zilong-L changed the title TartanVO Training - Uncertain Problems in Stage Two - Need help. TartanVO Training - Uncertain Problems in Stage Two Aug 7, 2024
@doujiarui
Copy link

Have you load PWC pre-training Network?

@Zilong-L
Copy link
Author

Zilong-L commented Sep 3, 2024

Have you load PWC pre-training Network?

Yes sir, I loaded this pretrained checkpoint:
https://github.com/NVlabs/PWC-Net/blob/master/PyTorch/pwc_net_chairs.pth.tar

@doujiarui
Copy link

doujiarui commented Sep 3, 2024

Could you please provide train_pose.txt and val_pase.txt? I also want to implement them. Thanks!

@Zilong-L
Copy link
Author

Zilong-L commented Sep 3, 2024

Could you please provide train_pose.txt and val_pase.txt? I also want to implement them. Thanks!

Off topic, but yes, in the repo there is a python script to generate reference pose file.
utils/list_files.py

you need to download tartanair dataset first, and run the script in a that data folder. and you would get something like this

/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P012/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P007/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P005/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P011/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P000/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P008/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P002/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P003/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P001/pose_left.txt

@doujiarui
Copy link

Could you please provide train_pose.txt and val_pase.txt? I also want to implement them. Thanks!

Off topic, but yes, in the repo there is a python script to generate reference pose file. utils/list_files.py

you need to download tartanair dataset first, and run the script in a that data folder. and you would get something like this

/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P012/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P007/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P005/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P011/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P000/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P008/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P002/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P003/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P001/pose_left.txt

Thank you. I have a little don't understand is about the division of training 、validation and test dataset, I see DPVO provides a tartan_test.txt:
abandonedfactory/abandonedfactory/Easy/P011
abandonedfactory/abandonedfactory/Hard/P011
abandonedfactory_night/abandonedfactory_night/Easy/P013
abandonedfactory_night/abandonedfactory_night/Hard/P014
Amusement/amusement/Easy/P008, ........

I don't know is how to divide in tartanVO, I can't find any such files , we may need to write some manually depending on the division in the paper?

@Zilong-L
Copy link
Author

Zilong-L commented Sep 3, 2024

Could you please provide train_pose.txt and val_pase.txt? I also want to implement them. Thanks!

Off topic, but yes, in the repo there is a python script to generate reference pose file. utils/list_files.py
you need to download tartanair dataset first, and run the script in a that data folder. and you would get something like this

/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P012/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P007/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P005/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P011/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P000/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P008/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P002/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P003/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P001/pose_left.txt

Thank you. I have a little don't understand is about the division of training 、validation and test dataset, I see DPVO provides a tartan_test.txt: abandonedfactory/abandonedfactory/Easy/P011 abandonedfactory/abandonedfactory/Hard/P011 abandonedfactory_night/abandonedfactory_night/Easy/P013 abandonedfactory_night/abandonedfactory_night/Hard/P014 Amusement/amusement/Easy/P008, ........

I don't know is how to divide in tartanVO, I can't find any such files , we may need to write some manually depending on the division in the paper?

Yes, the script is only for generating meta-informations. You need to manually split the dataset.
In the paper of TartanVO they said they set aside 3 scenes for validation. That's how they use the dataset.

If you encounter anything related to my codebase that needs attention, please raise an issue there. This helps keep this thread focused.

@doujiarui
Copy link

Thank you very much!

@zhangcv123
Copy link

I have reproduced the relevant code and can be contacted by email:[email protected]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants