-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WebDataset Reader #230
base: develop
Are you sure you want to change the base?
WebDataset Reader #230
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added few more comments. please update as soon as possible for us to merge
@swetha097 - please resolve conflicts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some minor comments
…097/rocAL into swbs/webdataset_PR_branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The breaks the current flow of build if libtar
is not present. We should not have it as a hard dependency. If it's missing libtar, we should disable web dataset reader. Azure CI is not failing due to this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added few more comments
This PR is huge. We need to make it into smaller PRs in future.
AsciiValues samples_ascii; | ||
AsciiComponent ascii_component; | ||
for (auto &elem : _map_content) { | ||
std::cerr << "Name :\t " << elem.first; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this function it is better to accumulate all prints in an std::string and issue one print statement at the end. Probably we can do it as another PR in all such instances
if (_meta_data_reader) | ||
THROW("A metadata reader has already been created") | ||
if (_augmented_meta_data) | ||
THROW("Metadata can only have a single output") | ||
|
||
bool generate_index = (index_path[0] == '\0') ? true : false; | ||
if (generate_index) | ||
std::cerr << "Index file is not provided, it may take some time to infer it from the tar file"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you want to continue the rest of the code after this error? why are we not using ERRLOG macro
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This statement is printed if the user doesn't pass the idx file for the webdataset tar file. We have added support to infer the file contents from the tar file if its not present.
We cannot use ERR macro here as this is not an error but a warning that it will take some time for inferring file contents. There are INFO and WRN macros which can be used here but we are not enabling them in CMakelists. Only ERR and THROW macros are available for use
int img_size = rocalGetImageNameLen(handle, image_name_length); | ||
std::vector<char> img_name(img_size); | ||
rocalGetImageName(handle, img_name.data()); | ||
std::cerr << "\n Image name: " << img_name.data() << "\n \n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we using std::cerr here and cout in other places
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the classification and detection pipelines, the image names and labels are printed using cerr while other metadata were printed using cout so we used cerr for the image name here. We'll change cout for all the prints for the webdataset pipeline
Contains support for