Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generating sequencing summary from fast5 raw reads #47

Open
maximilianmordig opened this issue Jun 27, 2022 · 1 comment
Open

Generating sequencing summary from fast5 raw reads #47

maximilianmordig opened this issue Jun 27, 2022 · 1 comment

Comments

@maximilianmordig
Copy link

Hi @skovaka
Thank you for developing UNCALLED.

I am wondering how to generate the sequence_summary file that is necessary to run the "uncalled sim" command as described in the README: /path/to/control/fast5s --ctl-seqsum /path/to/control/sequencing_summary.txt. These files don't seem to be provided.
So I have downloaded some E. coli fast5 raw reads, but they unfortunately don't come with the sequencing_summary.txt. To my understanding, the control fast5 files are only used to have the fast5 raw signal in the simulation, so I am also wondering why it relies on fields such as template_duration which is basecaller specific.

Thank you.

@skovaka
Copy link
Owner

skovaka commented Jun 29, 2022

We mainly use the sequencing summary to infer the timing between reads on each channel. This information is present in the fast5s as well, but parsing through every fast5 file takes much much longer than reading one text file. We also use the template start and duration in order to trim the adapter sequence and any noisy signal from each end of the reads. The ReadUntil API is able to do this in real-time, and the sequencing summary was the best/easiest way I could find to mimic that behavior. So, you are correct that it should be possible to simulate without a sequencing summary, but it would take some effort to work around those issues.

Some example sequencing summaries from human and a mock microbial community are available here: https://labshare.cshl.edu/shares/schatzlab/www-data/UNCALLED/simulator_files/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants