Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix PDB download #77

Merged
merged 2 commits into from
Feb 23, 2024
Merged

Fix PDB download #77

merged 2 commits into from
Feb 23, 2024

Conversation

linusyh
Copy link
Contributor

@linusyh linusyh commented Feb 23, 2024

Issue:

  • workshop download pdb gets stuck, failing to terminate.

Cause:

  • The pretein pdb_id: 8CKB is not available in the mmtf format, as verified with the official mmtf-python API. This causes rcsb.fetch to throw RequestError which, when uncaught, causes the entire download process to hang.

Solution:

  • Catch RequestError in download_pdb_mmtf
  • Attempt retries in case of network issues

During implementation, I found that download_pdb_mmtf is duplicated in datasets/utils and scripts/download_pdb_mmtf.

  • Remove code duplication

* Catch RequestError from pdb download

* Implement retires incase of network issues

* Remove code duplication
@chaitjo
Copy link
Collaborator

chaitjo commented Feb 23, 2024

That's a good catch that this particular protein is not downloadable. The last time we'd downloaded the PDB, this entry was not deposited so the error was not caught.

@a-r-j, @amorehead, is there a way to set a cutoff date later than which we do NOT download any newer PDBs via PDB manager?

@a-r-j
Copy link
Owner

a-r-j commented Feb 23, 2024

Thanks for the fix @linusyh

@chaitjo The download of the PDB is intended to function as a local copy so the correct behaviour would be to download any entries that are downloadable. Examples in downstream task datasets won't drift & this should automatically handle PDB code deprecation as structures are replaced with newer entries.

@a-r-j
Copy link
Owner

a-r-j commented Feb 23, 2024

@linusyh could you add a note to the changelog please

@a-r-j a-r-j closed this Feb 23, 2024
@a-r-j a-r-j reopened this Feb 23, 2024
@a-r-j a-r-j merged commit 1177022 into a-r-j:main Feb 23, 2024
1 check passed
@linusyh linusyh deleted the fix-pdb-download branch February 29, 2024 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants