Skip to content

Improving performance of data loading and collecting#223

Open
future-xy wants to merge 4 commits into
facebookresearch:mainfrom
future-xy:main
Open

Improving performance of data loading and collecting#223
future-xy wants to merge 4 commits into
facebookresearch:mainfrom
future-xy:main

Conversation

@future-xy

Copy link
Copy Markdown

This pull request fixes the performance issue #219

future-xy added 2 commits February 10, 2022 16:13
1. convert list first to np.ndarray before to torch.tensor
2. reorder ndarray faster
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 10, 2022
@future-xy

future-xy commented Feb 10, 2022

Copy link
Copy Markdown
Author

By the way, the time measurement in current code (i.e., from line 1534 - line 1596) is not accurate because it doesn't contain the time cost of the data collection before each iteration (i.e., line 1517), which actually costs almost as much time as the training process for Kaggle dataset.

for j, inputBatch in enumerate(train_ld):

After the optimization of this PR, the data collection process only costs about 2 seconds for Kaggle dataset.
My test was done on 1 2080ti and 20 CPUs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants