Improving performance of data loading and collecting #223

future-xy · 2022-02-10T16:31:30Z

This pull request fixes the performance issue #219

1. convert list first to np.ndarray before to torch.tensor 2. reorder ndarray faster

future-xy · 2022-02-10T16:39:49Z

By the way, the time measurement in current code (i.e., from line 1534 - line 1596) is not accurate because it doesn't contain the time cost of the data collection before each iteration (i.e., line 1517), which actually costs almost as much time as the training process for Kaggle dataset.

dlrm/dlrm_s_pytorch.py

Line 1517 in 9c2fda7

for j, inputBatch in enumerate(train_ld):

After the optimization of this PR, the data collection process only costs about 2 seconds for Kaggle dataset.
My test was done on 1 2080ti and 20 CPUs.

future-xy added 2 commits February 10, 2022 16:13

fix facebookresearch#219: fix a bug on randomization

64309e7

fix facebookresearch#219: optimize performance

1481fae

1. convert list first to np.ndarray before to torch.tensor 2. reorder ndarray faster

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 10, 2022

Yao Fu added 2 commits May 6, 2022 15:02

Merge branch 'facebookresearch:main' into main

3636c60

fix the best acc bug

aae6075

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving performance of data loading and collecting #223

Improving performance of data loading and collecting #223

future-xy commented Feb 10, 2022

future-xy commented Feb 10, 2022 •

edited

Loading

Improving performance of data loading and collecting #223

Are you sure you want to change the base?

Improving performance of data loading and collecting #223

Conversation

future-xy commented Feb 10, 2022

future-xy commented Feb 10, 2022 • edited Loading

future-xy commented Feb 10, 2022 •

edited

Loading