Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recover after errors opening one of WARC files in a batch #59

Open
nvanva opened this issue Jun 1, 2024 · 1 comment
Open

Recover after errors opening one of WARC files in a batch #59

nvanva opened this issue Jun 1, 2024 · 1 comment
Assignees

Comments

@nvanva
Copy link
Contributor

nvanva commented Jun 1, 2024

We have two broken metadata.zst files, zstdcat returns "Read error (39) : premature end ".
Logs for processing these are on NIRD:
two/log_html/archivebot_partial_logs/114.stderr
two/log_html/archivebot_partial_logs/97.stderr

Both end with "file opening failed, skipping this WARC". Seems that after problems with opening one WARC from a batch warc2text doesn't continue with the rest files, and also doesn't close the output zst files correctly.

Would be nice to debug what happens if one of the files in a batch cannot be open or there are other processing errors.

@ZJaume
Copy link
Member

ZJaume commented Jun 3, 2024

Could you copy the problematic warcs to LUMI? I do not have acces to NIRD

@ZJaume ZJaume self-assigned this Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants