-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Adds FileDictMixin #347
Conversation
Hi @ebolyen @lizgehret Let me know if this makes sense to you or if you have any questions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @VinzentRisch, I think this will be very helpful. One thing I'm noticing is that a user of this method will have to be aware of the suffixes that they want to remove. What if instead the suffixes lived as state on the classes that support the mixin, and your method reads them from there? This feels a bit more convenient--I wouldn't have to look up the kraken2 format for example if I'm developing an action that needs this sample-id/filepath mapping. The drawback is that we'll have to make sure that this mixin is only added to classes that have this required class attribute (I'm thinking this could just be the suffixes
list that's currently a parameter to your method), but this seems worthwhile. What are your thoughts?
Thanks for the review! The suffixes as an attribute is a good idea. I implemented it with the option to add the mixin to classes that dont have the attribute. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VinzentRisch This looks good. Should we add some of these suffixes attributes to the classes the mixin was added to? I think it would also be best if we had one test that uses a real format.
I already added the suffixes to the Kraken formats. The GenomeData formats dont use any suffixes. |
Ah oops, I forgot to push that last commit. Now the suffixes are added to the Kraken formats. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VinzentRisch, thank you for the attribute changes, I think this will be helpful. Just a couple more small suggestions and then I think it'll be good to merge.
Hey @colinvwood |
solves #346
Adds FileDictMixin class that includes the function file_dict.
For per sample directories it returns a mapping of sample id to another dictionary where keys represent the ID and values correspond to the filepath for each file.
For files, it returns a mapping of ID to filepath for each file.
To create the ID the specified suffix is removed from the filename. it is expected that the suffix is given with the used separator character like ".report"
This mixin can only be added to classes with a pathspec attribute.
For now it is added to
Kraken2ReportDirectoryFormat
andKraken2OutputsDirectoryFormat
.I also removed GenomeDataDirFmt and its genome_dict function and added FileDictMixin to GenesDirectoryFormat, ProteinsDirectoryFormat, LociDirectoryFormat and GenomeSequencesDirectoryFormat
This Mixin would also work with
MAGSequencesDirFmt
andMultiMAGSequencesDirFmt
.