Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sample training data takes tens of minutes one epoch on Linux with an A800 #241

Open
GuohuaQiu1999 opened this issue Nov 6, 2024 · 0 comments

Comments

@GuohuaQiu1999
Copy link

GuohuaQiu1999 commented Nov 6, 2024

When using version 0.10.2 of decode and the same parameter.yaml file, the time to sample training data on Linux with an A800 is significantly larger than on Windows with GTX3080Ti .Both using the parameters below for simulation,

Hardware:
  device: cuda:0
  device_ix: 0
  device_simulation: cuda:0
  num_worker_train: 1
  torch_multiprocessing_sharing_strategy: null
  torch_threads: 4
  unix_niceness: 0
Simulation:
  bg_uniform:
  - 40.0
  - 60.0
  density: null
  emitter_av: 250
  emitter_extent:
  - - -0.5
    - 63.5
  - - -0.5
    - 63.5
  - - -2000
    - 2000
  img_size:
  - 64
  - 64
  intensity_mu_sig:
  - 3000.0
  - 100.0

On Windows with GTX3080Ti, the time to sample training data per epoch during training is about 8 seconds. However, on Linux with an A800, it takes tens of minutes (I didn’t wait for it to finish sampling in an epoch because it took too long). I investigated the code and added print statements at key points, and found that it was very slow at the line:

frames = self._spline_impl.forward_frames(*self.img_shape,   
                                          frame_ix,   
                                          n_frames,   
                                          xyz_r[:, 0],   
                                          xyz_r[:, 1],   
                                          xyz_r[:, 2],   
                                          ix[:, 0],   
                                          ix[:, 1], 
                                          weight)

However, using nvidia-smi, I saw that the GPU utilization was consistently at 100%, which is very strange. I specifically checked the spline library and found that it was compiled with sm_37. Could this be the reason for the performance issue? But sm_37 compiled code does not affect the performance on Windows with GTX 3080Ti. Recompiling to test whether it is the problem is quite difficult for me, so I hope to seek your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant