Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

single_to_multi_fast5 do not collect all the single files if the input folder contains mixed types of fast5 files. #81

Open
Marjan-Hosseini opened this issue Dec 3, 2023 · 0 comments

Comments

@Marjan-Hosseini
Copy link

Marjan-Hosseini commented Dec 3, 2023

I have a dataset that contains thousands of mixed multiple and single fast5 files in a non-homogenous folder structure.
I want to convert all the fast5 files to multi fast5 files.

My solution is to first convert all multi fast5 files to single. The command multi_to_single_fast5 converts only the multi fast5 files to single in a new folder:

orig_path=mixed
save_path=multi
single_path=single
multi_to_single_fast5 -i $orig_path/ -s $single_path/ --recursive

The above command collects all the reads that exist in any multi fast5 files as single fast5 files in $single_path.
Then I can convert them all back to multi and make sure I am not missing any read:

single_to_multi_fast5 -i $single_path/ -s $save_path/ --filename_base $output_name --batch_size 1000 --recursive

The above command works fine too.
Now I want to use single_to_multi_fast5 command on a folder that contains both multi and single fast5 files ($orig_path) and I expect that it collects all the reads in the single files that exist in $orig_path into muti-files.

single_to_multi_fast5 -i $orig_path/ -s $save_path/ --filename_base $output_name --batch_size 1000 --recursive

But I don't get all the reads from the single fast5 files and some reads are missing in the output folder. This command works fine on the folder that contains only single fast5 files.

Nothing is overwritten and I am testing these steps on a few files in a different folder.
Is there a solution to this problem except that I have to check every file to be multi or single? My dataset is super huge, I cannot check if individual files are single and multi. It would take ages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant