Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Adds FileDictMixin #347

Merged
merged 23 commits into from
Feb 11, 2025
Merged

ENH: Adds FileDictMixin #347

merged 23 commits into from
Feb 11, 2025

Conversation

VinzentRisch
Copy link
Contributor

@VinzentRisch VinzentRisch commented Oct 15, 2024

solves #346

  • Adds FileDictMixin class that includes the function file_dict.

  • For per sample directories it returns a mapping of sample id to another dictionary where keys represent the ID and values correspond to the filepath for each file.

  • For files, it returns a mapping of ID to filepath for each file.

  • To create the ID the specified suffix is removed from the filename. it is expected that the suffix is given with the used separator character like ".report"

  • This mixin can only be added to classes with a pathspec attribute.

  • For now it is added to Kraken2ReportDirectoryFormat and Kraken2OutputsDirectoryFormat.

  • I also removed GenomeDataDirFmt and its genome_dict function and added FileDictMixin to GenesDirectoryFormat, ProteinsDirectoryFormat, LociDirectoryFormat and GenomeSequencesDirectoryFormat

This Mixin would also work with MAGSequencesDirFmt and MultiMAGSequencesDirFmt.

@VinzentRisch VinzentRisch marked this pull request as ready for review October 17, 2024 14:01
@ebolyen ebolyen self-assigned this Dec 12, 2024
@VinzentRisch
Copy link
Contributor Author

Hi @ebolyen @lizgehret
Could someone have a look at this?
The Idea is to have a mixin that can be added to any directory format that enables the format to use a file_dict function, similar to sample_dict function in MultiMAGSequencesDirFmt. So that the sample_dict function does not have to be added to any new formats.
This would be helpful for bokulich-lab/q2-annotate#212 and for my new plugin that I am working on right now.

Let me know if this makes sense to you or if you have any questions.

@cherman2 cherman2 assigned colinvwood and unassigned ebolyen Feb 4, 2025
Copy link
Contributor

@colinvwood colinvwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @VinzentRisch, I think this will be very helpful. One thing I'm noticing is that a user of this method will have to be aware of the suffixes that they want to remove. What if instead the suffixes lived as state on the classes that support the mixin, and your method reads them from there? This feels a bit more convenient--I wouldn't have to look up the kraken2 format for example if I'm developing an action that needs this sample-id/filepath mapping. The drawback is that we'll have to make sure that this mixin is only added to classes that have this required class attribute (I'm thinking this could just be the suffixes list that's currently a parameter to your method), but this seems worthwhile. What are your thoughts?

@colinvwood colinvwood assigned VinzentRisch and unassigned colinvwood Feb 4, 2025
@VinzentRisch
Copy link
Contributor Author

Thanks for the review! The suffixes as an attribute is a good idea. I implemented it with the option to add the mixin to classes that dont have the attribute.

Copy link
Contributor

@colinvwood colinvwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VinzentRisch This looks good. Should we add some of these suffixes attributes to the classes the mixin was added to? I think it would also be best if we had one test that uses a real format.

@VinzentRisch
Copy link
Contributor Author

I already added the suffixes to the Kraken formats. The GenomeData formats dont use any suffixes.
I will add tests for a real format.

@VinzentRisch
Copy link
Contributor Author

Ah oops, I forgot to push that last commit. Now the suffixes are added to the Kraken formats.
And I added the test with a real format.

Copy link
Contributor

@colinvwood colinvwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VinzentRisch, thank you for the attribute changes, I think this will be helpful. Just a couple more small suggestions and then I think it'll be good to merge.

@VinzentRisch
Copy link
Contributor Author

Hey @colinvwood
Thanks for the notes! I added the helper function to the class.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Completed
Development

Successfully merging this pull request may close these issues.

3 participants