-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High count of no-barcode that map to my targets #363
Comments
When you run readfish on playback with a barcoded sample you can get very unexpected results. What you are seeing here is a consequence of how playback works. If you imagine a single channel on the sequencer, playback provides the signal that passed through the channel exactly as it happened on the original run. If you send a message to the sequencer to throw away the read, the the sequencer will break the playback read at that point and then start a new read from the signal that is coming from the playback file. That is not a true read and so it will therefore not begin with a barcode - you are starting half way through a read and it is correctly classified as unclassified (if that makes sense!). I hope that helps. |
Thank you for the answer, I was wondering how it worked. So in general a simulated run will underestimate the number of true hits? Do you usually run several simulated run to get an idea of the perfomance of AS? |
Do you usually get a better idea of the performance by not testing with barcoded experiments? |
If you wish to simulate barcoded runs (or non barcoded runs) you can use our icarust tool: https://academic.oup.com/bioinformatics/article/40/4/btae141/7628125 It all depends on what you are trying to test with respect to performance as to whihc approach is better. |
The idea for us is to compare A wgs run to adaptive sampling basically. So we perform a wgs run and we then run a simulated AS run for the same amount of time using the bulk file. |
OK - assume you are targetting a 1mb region of a 10mb genome. When you run your normal WGS (and record a bulkfile) you obtain 100x coverage of the 10 mb genome (and therefore 100x of your 1mb region). When you run adaptive sampling on the playback of that bulkfile you will still end up with approximately 100x coverage of your 10 mb genome. The reason for this is that in playback you don't actully throw away the read - you merely break a read that you don't want into smaller bits. So to see the effect of adaptive sampling on you run on playback you would need to look at the read lengths on your 1mb target region vs your 9mb of off-target. You should see long reads "on-target" and short reads "off-target". In contrast, Icarust will simulate reads being removed, but does so from simulated data. With icarust you will see actual enrichment - but it is more theoretical. |
I was also wondering how the playback works when your amount of "on target" reads is limited (because of WGS), what happens when there no more reads corresponding to your ref (say your 1mb region)? I saw the release of Icarust, I was under the impression that it was not yet finished but I will try it eventually, my initial thought was that playback runs were a more realistic approach to testing but apparently I was wrong. |
I wouldn't say you are wrong... it's just that playback doesn't actually remove the molecule from the sequencer and so it's really easy to misinterpret the results. We use playback to look at the relative lengths of the molecules we get on and off target. They should be as short as possible off target and as long as they orginally were on target. You can also cehck mapping efficiency using playback. You just won't see any actual enrichment! Hope all these comments are useful :-) |
Ok gotcha! thanks for the plots very informative. How does Minknow pick the next read to come in when using playback, does the initial "timeline" hold or is it randomly picked in the bag of molecules that went through this channel during the original run? |
The initial timeline holds - so reads playout exactly as they were recorded. All that happens is the signal from a read gets broken up into smaller chunks. |
So what happens to reads that are broken down, the next read will come in at the same time as it was originally sequenced? the pore just waits ? |
No. Imagine you have a read of 10 kb. The sequencer plays back the 1st 1000 bases, but you then "unblock" so the sequencer ends the reads at 1kb. But the signal from the original 10kb read keeps playing, so you will get a new read starting at 1kb (plus a small bit) into the old read. And so on - until the original read finishes - and then it goes on to the next read. So, your original 10kb read could be chopped up into 10 fragments of 1 kb each. Does that make sense? That is why you end up with reads without a barcode when you run adaptvie sampling on a playback run. |
I see what you mean it is much clearer now, so while the single of the unblocked read is playing another reads can be playing in the same "pore" and thus chopping it ? |
Dear Readfish team,
Thank you for your work. I am using readfish to enrich for certain targets and I am testing it with a simulated run ( 2hours of WGS). It seems that since dorado there is a work around to deal with the empty barcode signal (ref issue commented by matt Loose). I generally am under the impression that I find a lot of no_barcode_found and unclassified barcodes when doing a simulation with readfish. I was wondering if this was due to the simulation run or an issue linked to Dorado ?
Readfish simulated run with AS
Original WGS run:
best,
Loïc
The text was updated successfully, but these errors were encountered: