-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
variants option #31
Comments
Hi Molly, thank you for your issue. You are right, the variant function was removed at some point and I think we should bring it back to make it consistent with the publication. Also the association workflow should be able to handle singel nucletide changes between oligos. But for the association workflow I have to ping @graciegordon because this step is different between SF (implemented in MPRAflow) and our Berlin pipeline. We can handle single base-pair differences (ref/alt), as you always have it when you have designed oligos with one SNV. So I am happy to help to get you the association and its pickle file, needed for the count workflow. The count workflow is not aware of variants, but it does not have to. When you have the final count table you can simply use the rations between alt and ref. Or try to export everything to MPRAnalyze which is also able to generate variant effects (in a more sufisticated way). Best, |
Hi Molly, Thanks for bringing this to our attention! I had a branch where I was testing the variant option, but it seems that this never got fully integrated, I'll try and fix this as soon as possible. One thing that can be done in the meantime with the current pipeline is to use the --cigar flag. This is a simple filter that can be used for force exact matches between the association sequencing and the sequences that are in the design file (the oligos that were ordered). So if your candidate sequence was 200bp, you would just pass 200M to the cigar flag. As a warning this is a very stringent threshold and runs the risk of removing barcodes from the analysis. To add to Max's comment about MPRAnalyze, we do have a flag that should allow for easy export to this pipeline in the count workflow. Hope this helps, |
Hi, Thank you very much for your advice and your patience in waiting for my response (I have been troubleshooting my results for the association step). I am still finding that no barcodes associate with candidate regulatory sequences. As I mentioned in my previous comment I believe this is because the alignment is failing, as the fastq sequences map to multiple sequences in the design.fa file due to their similarity. I’d hoped the variant flag (and workarounds you suggested) would help this, but am still having no association. Could I please check whether the pipeline can handle candidate regulatory sequences tested in forward and reverse? I have forward and reverse complement sequences in my design file and I wondered if these may be ‘seen’ as similar by the aligner. Many thanks, |
Hi Molly, The pipeline hasn't been tested on any data where the forward and reverse sequences were included in the same oligo pool. I think you're right that they might be seen as similar by the aligner. You should be able to check this by looking at the sam files from your alignment using the command "samtools view -F 20 your_file.sam", which should subset for sequences only aligned in the forward direction and look to see if the MAPQ scores are similar between the forward mapping and reverse mapping. This is a flag that I think we could add as an option to the pipeline if it's helpful. Best, |
Hi, Thanks for your message Gracie. Using the –F flag, it does look as though the forward and reverse sequences are being read the same by the aligner. I wondered whether it would be possible for @visze to point me in the direction of the Berlin pipeline which has the variant option available as I’d also be interested to see how this works with our data. Is it in an older version of the pipeline on GitHub? At the moment we are looking at workarounds to manage the forward/reverse sequences in the library. Many thanks, Molly |
Hi Molly, Our pipeline is not online available but we are happy to help. Might be the best if you contact us directly and we can schedule a short online meeting. You find my email and phone number on our website. Best, |
Hi @visze and @graciegordon, similar to @mollywent I am making an MPRA library to test SNVs and have been closely following the lentiMPRA protocol. Has the --variants option been added back to the association utility? If not, should I expect MPRAflow to be able to discern between my ref and alt alleles (as suggested by @visze). Thank you guys for your continued support for MPRAflow! |
Hi Travis, I am sorry, but the option is not in yet. But good news. I think you do not need it. :-) When you design your library you know the exact reference and the location where the variant is. You can completele forgett that you have variants until the end of the Then you should use the
Yes it is very stringend but normally this is absoluetly enough and you will reduce errors in your assignment which gives you more stable counts (if you can assign enough barcodes). Only at the end of the count workflow, when you have the counts, you can bring reference and alternative sequence of variants together and get the fold changes. |
Also jumping in here -- my library design contains multiple variants per target sequence (3-5), and I'm wondering what the recommended approach would be. I am getting decent results so long as I don't include a MAPQ filter, as most sequences map to multiple CRSs. However, can I be sure that the Association utility uses the "best" alignment, when there is one (perhaps on the basis of the AS/XS tags)? Also curious how ties would be dealt with. Apologies if this is explained somewhere, but I couldn't find documentation about this behavior. Thanks! Colin |
Hi,
I have been following the protocol in "lentiMPRA and MPRAflow for high-throughput functional characterization of gene regulatory elements" in combination with the documentation on MPRAflow github.
In the paper on step 167 you have a "--variants" option. I am interested in quantifying differential activity of single-nucleotide variants, so think this may be the option for me. However, I have noticed on github this option is now removed- I can’t see it in the source code or any other documentation.
Is there something equivalent to this option? At the moment, my fastq files are not finding any barcodes to associate with candidate sequences in the association step. I believe this is because the alignment is failing, as the fastq sequences map to multiple sequences in the design.fa file due to their similarity (they only differ by one base). Is there a way I can resolve this?
Many thanks,
Molly
The text was updated successfully, but these errors were encountered: