You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am wondering how to generate the sequence_summary file that is necessary to run the "uncalled sim" command as described in the README: /path/to/control/fast5s --ctl-seqsum /path/to/control/sequencing_summary.txt. These files don't seem to be provided.
So I have downloaded some E. coli fast5 raw reads, but they unfortunately don't come with the sequencing_summary.txt. To my understanding, the control fast5 files are only used to have the fast5 raw signal in the simulation, so I am also wondering why it relies on fields such as template_duration which is basecaller specific.
Thank you.
The text was updated successfully, but these errors were encountered:
We mainly use the sequencing summary to infer the timing between reads on each channel. This information is present in the fast5s as well, but parsing through every fast5 file takes much much longer than reading one text file. We also use the template start and duration in order to trim the adapter sequence and any noisy signal from each end of the reads. The ReadUntil API is able to do this in real-time, and the sequencing summary was the best/easiest way I could find to mimic that behavior. So, you are correct that it should be possible to simulate without a sequencing summary, but it would take some effort to work around those issues.
Hi @skovaka
Thank you for developing UNCALLED.
I am wondering how to generate the sequence_summary file that is necessary to run the "uncalled sim" command as described in the README:
/path/to/control/fast5s --ctl-seqsum /path/to/control/sequencing_summary.txt
. These files don't seem to be provided.So I have downloaded some E. coli fast5 raw reads, but they unfortunately don't come with the sequencing_summary.txt. To my understanding, the control fast5 files are only used to have the fast5 raw signal in the simulation, so I am also wondering why it relies on fields such as
template_duration
which is basecaller specific.Thank you.
The text was updated successfully, but these errors were encountered: