TartanVO Training - Uncertain Problems in Stage Two #21

Zilong-L · 2024-08-07T17:25:25Z

Hi,
First of all, I really appreciate the incredible work you've done on TartanVO. It's a fantastic contribution to the field.

I've been working on replicating the training code for TartanVO, and I'm encountering some issues. In the first phase, where the goal is to predict pose from groundtruth (GT) data, the performance is acceptable (I assume). Even though the result is not as good as the 1914.pkl file provided by the authors, the loss decreases normally (I assume) throughout the training process.

However, in the second phase, where the PWC-Net and FlowPoseNet are connected, I'm facing difficulties and I'm not sure where the problem lies. The loss is not decreasing as far as I can tell. I would greatly appreciate any assistance or insights you provide.

Current Results

Training result for pose only(stage one) training

Training result for stage two.

I don't see a decrese.

Details of my setup

Loss Calculation Method:

These are what I found in a COMPASS repo linked by another issue #14.

Total loss

flow_loss = ddp_model.module.flowNet.get_loss(flow,flow_gt,small_scale=True)
pose_loss,trans_loss,rot_loss = ddp_model.module.flowPoseNet.linear_norm_trans_loss(relative_motion, motions_gt)
total_loss = flow_loss*lambda_flow + pose_loss

Pose loss

    def linear_norm_trans_loss(self, output, motion, mask=None):
        output_trans = output[:, :3]
        output_rot = output[:, 3:]

        trans_norm = torch.norm(output_trans, dim=1).view(-1, 1)
        output_norm = output_trans/trans_norm

        if mask is None:
            trans_loss = self.criterion(output_norm, motion[:, :3])
            rot_loss = self.criterion(output_rot, motion[:, 3:])
        else:
            trans_loss = self.criterion(output_norm[mask,:], motion[mask, :3])
            rot_loss = self.criterion(output_rot[mask,:], motion[mask, 3:])

        loss = (rot_loss + trans_loss)/2.0

        return loss, trans_loss.item() , rot_loss.item()

Flow Loss

    def get_loss(self, output, target,  small_scale=False):
        '''
        return flow loss
        '''
        criterion = self.criterion
        if self.training:
            target4, target8, target16, target32, target64 = self.scale_targetflow(target, small_scale)
            loss1 = criterion(output[0], target4)
            loss2 = criterion(output[1], target8)
            loss3 = criterion(output[2], target16)
            loss4 = criterion(output[3], target32)
            loss5 = criterion(output[4], target64)
            loss = (loss1 + loss2 + loss3 + loss4 + loss5)/5.0
        else:
            if small_scale:
                output4 = output[0]
            else:
                output4 = F.interpolate(output[0], scale_factor=4, mode='bilinear', align_corners=True)# /4.0
            loss = criterion(output4, target)
        return loss

Dataset Preparation.

Other than transformations, I use the basically the same configuration in this repo. and again this one is what I get from the COMPASS repo.

transform = Compose([CropCenter((height,width)), RandomResizeCrop(size=(448, 640)),DownscaleFlow(),ToTensor()])
class RandomResizeCrop(object):
    """
    Random scale to cover continuous focal length
    Due to the tartanair focal is already small, we only up scale the image

    """

    def __init__(self, size, max_scale=2.5, keep_center=False, fix_ratio=False, scale_disp=False):
        '''
        size: output frame size, this should be NO LARGER than the input frame size! 
        scale_disp: when training the stereovo, disparity represents depth, which is not scaled with resize 
        '''
        if isinstance(size, numbers.Number):
            self.target_h = int(size)
            self.target_w = int(size)
        else:
            self.target_h = size[0]
            self.target_w = size[1]

        # self.max_focal = max_focal
        self.keep_center = keep_center
        self.fix_ratio = fix_ratio
        self.scale_disp = scale_disp
        # self.tartan_focal = 320.

        # assert self.max_focal >= self.tartan_focal
        self.scale_base = max_scale #self.max_focal /self.tartan_focal

    def __call__(self, sample): 
        for kk in sample:
            if len(sample[kk].shape)>=2:
                h, w = sample[kk].shape[0], sample[kk].shape[1]
                break
        self.target_h = min(self.target_h, h)
        self.target_w = min(self.target_w, w)

        scale_w, scale_h, x1, y1, crop_w, crop_h = generate_random_scale_crop(h, w, self.target_h, self.target_w, 
                                                    self.scale_base, self.keep_center, self.fix_ratio)

        for kk in sample:
            # if kk in ['flow', 'flow2', 'img0', 'img0n', 'img1', 'img1n', 'intrinsic', 'fmask', 'disp0', 'disp1', 'disp0n', 'disp1n']:
            if len(sample[kk].shape)>=2 or kk in ['fmask', 'fmask2']:
                sample[kk] = sample[kk][y1:y1+crop_h, x1:x1+crop_w]
                sample[kk] = cv2.resize(sample[kk], (0,0), fx=scale_w, fy=scale_h, interpolation=cv2.INTER_LINEAR)
                # Note opencv reduces the last dimention if it is one
                sample[kk] = sample[kk][:self.target_h,:self.target_w]

        # scale the flow
        if 'flow' in sample:
            sample['flow'][:,:,0] = sample['flow'][:,:,0] * scale_w
            sample['flow'][:,:,1] = sample['flow'][:,:,1] * scale_h
        # scale the flow
        if 'flow2' in sample:
            sample['flow2'][:,:,0] = sample['flow2'][:,:,0] * scale_w
            sample['flow2'][:,:,1] = sample['flow2'][:,:,1] * scale_h

        if self.scale_disp: # scale the depth
            if 'disp0' in sample:
                sample['disp0'][:,:] = sample['disp0'][:,:] * scale_w
            if 'disp1' in sample:
                sample['disp1'][:,:] = sample['disp1'][:,:] * scale_w
            if 'disp0n' in sample:
                sample['disp0n'][:,:] = sample['disp0n'][:,:] * scale_w
            if 'disp1n' in sample:
                sample['disp1n'][:,:] = sample['disp1n'][:,:] * scale_w
        else:
            sample['scale_w'] = np.array([scale_w ])# used in e2e-stereo-vo

        return sample

Some Hyperparameters

learning_rate = 0.0001
lambda_flow = 1
scheduler setup:
def lr_lambda(iteration):
    if iteration < 0.5 * total_iterations:
        return 1.0
    elif iteration < 0.875 * total_iterations:
        return 0.2
    else:
        return 0.04

Main training logics

    while iteration < total_iterations:
        for sample in train_dataloader:
            ddp_model.train()
            optimizer.zero_grad()  # Zero the parameter gradients
            if iteration >= total_iterations:
                print(f"Successfully completed training for {iteration} iterations")
                break
            
            total_loss,flow_loss,pose_loss,trans_loss,rot_loss = process_whole_sample(ddp_model,sample,lambda_flow,device_id)
            # backpropagation----------------------------------------------------------
            total_loss.backward()
            optimizer.step()
            scheduler.step()

            iteration += 1

def process_whole_sample(ddp_model,sample,lambda_flow,device_id):
    sample = {k: v.to(device_id) for k, v in sample.items()} 
    # inputs-------------------------------------------------------------------
    img1 = sample['img1']
    img2 = sample['img2']
    intrinsic_layer = sample['intrinsic']
        
    # forward------------------------------------------------------------------
    flow, relative_motion = ddp_model([img1,img2,intrinsic_layer])


    # loss calculation---------------------------------------------------------
    flow_gt = sample['flow']
    motions_gt = sample['motion']
    flow_loss = ddp_model.module.flowNet.get_loss(flow,flow_gt,small_scale=True)
    pose_loss,trans_loss,rot_loss = ddp_model.module.flowPoseNet.linear_norm_trans_loss(relative_motion, motions_gt)
    total_loss = flow_loss*lambda_flow + pose_loss
    
    return total_loss,flow_loss,pose_loss,trans_loss,rot_loss

Is there any other information I should provide to help diagnose the issue? Thank you in advance for your help!

I apologize for the long post, but I'm at a point where I can't make further progress and really hope to get support from the community. My humble implementation is available here.

A million thank you,

Best regards.

The text was updated successfully, but these errors were encountered:

doujiarui · 2024-09-03T04:06:58Z

Have you load PWC pre-training Network?

Zilong-L · 2024-09-03T04:56:37Z

Have you load PWC pre-training Network?

Yes sir, I loaded this pretrained checkpoint:
https://github.com/NVlabs/PWC-Net/blob/master/PyTorch/pwc_net_chairs.pth.tar

doujiarui · 2024-09-03T07:48:02Z

Could you please provide train_pose.txt and val_pase.txt? I also want to implement them. Thanks!

Zilong-L · 2024-09-03T07:57:06Z

Could you please provide train_pose.txt and val_pase.txt? I also want to implement them. Thanks!

Off topic, but yes, in the repo there is a python script to generate reference pose file.
utils/list_files.py

you need to download tartanair dataset first, and run the script in a that data folder. and you would get something like this

/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P012/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P007/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P005/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P011/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P000/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P008/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P002/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P003/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P001/pose_left.txt

doujiarui · 2024-09-03T08:13:08Z

Could you please provide train_pose.txt and val_pase.txt? I also want to implement them. Thanks!

Off topic, but yes, in the repo there is a python script to generate reference pose file. utils/list_files.py

you need to download tartanair dataset first, and run the script in a that data folder. and you would get something like this

/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P012/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P007/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P005/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P011/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P000/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P008/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P002/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P003/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P001/pose_left.txt

Thank you. I have a little don't understand is about the division of training 、validation and test dataset, I see DPVO provides a tartan_test.txt:
abandonedfactory/abandonedfactory/Easy/P011
abandonedfactory/abandonedfactory/Hard/P011
abandonedfactory_night/abandonedfactory_night/Easy/P013
abandonedfactory_night/abandonedfactory_night/Hard/P014
Amusement/amusement/Easy/P008, ........

I don't know is how to divide in tartanVO, I can't find any such files , we may need to write some manually depending on the division in the paper？

Zilong-L · 2024-09-03T08:35:04Z

Could you please provide train_pose.txt and val_pase.txt? I also want to implement them. Thanks!

Off topic, but yes, in the repo there is a python script to generate reference pose file. utils/list_files.py
you need to download tartanair dataset first, and run the script in a that data folder. and you would get something like this
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P012/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P007/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P005/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P011/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P000/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P008/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P002/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P003/pose_left.txt
/home/lzl/code/python/tartanvo_train_implementation/data/soulcity/Easy/P001/pose_left.txt
Thank you. I have a little don't understand is about the division of training 、validation and test dataset, I see DPVO provides a tartan_test.txt: abandonedfactory/abandonedfactory/Easy/P011 abandonedfactory/abandonedfactory/Hard/P011 abandonedfactory_night/abandonedfactory_night/Easy/P013 abandonedfactory_night/abandonedfactory_night/Hard/P014 Amusement/amusement/Easy/P008, ........

I don't know is how to divide in tartanVO, I can't find any such files , we may need to write some manually depending on the division in the paper？

Yes, the script is only for generating meta-informations. You need to manually split the dataset.
In the paper of TartanVO they said they set aside 3 scenes for validation. That's how they use the dataset.

If you encounter anything related to my codebase that needs attention, please raise an issue there. This helps keep this thread focused.

doujiarui · 2024-09-03T08:49:55Z

Thank you very much！

zhangcv123 · 2024-12-10T06:19:47Z

I have reproduced the relevant code and can be contacted by email:[email protected]

Zilong-L changed the title ~~TartanVO Training - Uncertain Problems in Stage Two - Need help.~~ TartanVO Training - Uncertain Problems in Stage Two Aug 7, 2024

doujiarui mentioned this issue Sep 24, 2024

PWC-Net not to converge Zilong-L/tartanvo_train_implementation#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TartanVO Training - Uncertain Problems in Stage Two #21

TartanVO Training - Uncertain Problems in Stage Two #21

Zilong-L commented Aug 7, 2024 •

edited

Loading

doujiarui commented Sep 3, 2024

Zilong-L commented Sep 3, 2024

doujiarui commented Sep 3, 2024 •

edited

Loading

Zilong-L commented Sep 3, 2024

doujiarui commented Sep 3, 2024

Zilong-L commented Sep 3, 2024

doujiarui commented Sep 3, 2024

zhangcv123 commented Dec 10, 2024

TartanVO Training - Uncertain Problems in Stage Two #21

TartanVO Training - Uncertain Problems in Stage Two #21

Comments

Zilong-L commented Aug 7, 2024 • edited Loading

Current Results

Training result for pose only(stage one) training

Training result for stage two.

Details of my setup

Loss Calculation Method:

Dataset Preparation.

Some Hyperparameters

Main training logics

doujiarui commented Sep 3, 2024

Zilong-L commented Sep 3, 2024

doujiarui commented Sep 3, 2024 • edited Loading

Zilong-L commented Sep 3, 2024

doujiarui commented Sep 3, 2024

Zilong-L commented Sep 3, 2024

doujiarui commented Sep 3, 2024

zhangcv123 commented Dec 10, 2024

Zilong-L commented Aug 7, 2024 •

edited

Loading

doujiarui commented Sep 3, 2024 •

edited

Loading