Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving performance of data loading and collecting #223

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

future-xy
Copy link

This pull request fixes the performance issue #219

future-xy added 2 commits February 10, 2022 16:13
1. convert list first to np.ndarray before to torch.tensor
2. reorder ndarray faster
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 10, 2022
@future-xy
Copy link
Author

future-xy commented Feb 10, 2022

By the way, the time measurement in current code (i.e., from line 1534 - line 1596) is not accurate because it doesn't contain the time cost of the data collection before each iteration (i.e., line 1517), which actually costs almost as much time as the training process for Kaggle dataset.

for j, inputBatch in enumerate(train_ld):

After the optimization of this PR, the data collection process only costs about 2 seconds for Kaggle dataset.
My test was done on 1 2080ti and 20 CPUs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants