variants option #31

mollywent · 2020-12-02T10:30:25Z

Hi,

I have been following the protocol in "lentiMPRA and MPRAflow for high-throughput functional characterization of gene regulatory elements" in combination with the documentation on MPRAflow github.

In the paper on step 167 you have a "--variants" option. I am interested in quantifying differential activity of single-nucleotide variants, so think this may be the option for me. However, I have noticed on github this option is now removed- I can’t see it in the source code or any other documentation.

Is there something equivalent to this option? At the moment, my fastq files are not finding any barcodes to associate with candidate sequences in the association step. I believe this is because the alignment is failing, as the fastq sequences map to multiple sequences in the design.fa file due to their similarity (they only differ by one base). Is there a way I can resolve this?

Many thanks,
Molly

visze · 2020-12-04T17:01:56Z

Hi Molly,

thank you for your issue. You are right, the variant function was removed at some point and I think we should bring it back to make it consistent with the publication. Also the association workflow should be able to handle singel nucletide changes between oligos.

But for the association workflow I have to ping @graciegordon because this step is different between SF (implemented in MPRAflow) and our Berlin pipeline. We can handle single base-pair differences (ref/alt), as you always have it when you have designed oligos with one SNV. So I am happy to help to get you the association and its pickle file, needed for the count workflow.

The count workflow is not aware of variants, but it does not have to. When you have the final count table you can simply use the rations between alt and ref. Or try to export everything to MPRAnalyze which is also able to generate variant effects (in a more sufisticated way).

Best,
Max

graciegordon · 2020-12-04T19:35:08Z

Hi Molly,

Thanks for bringing this to our attention! I had a branch where I was testing the variant option, but it seems that this never got fully integrated, I'll try and fix this as soon as possible.

One thing that can be done in the meantime with the current pipeline is to use the --cigar flag. This is a simple filter that can be used for force exact matches between the association sequencing and the sequences that are in the design file (the oligos that were ordered). So if your candidate sequence was 200bp, you would just pass 200M to the cigar flag. As a warning this is a very stringent threshold and runs the risk of removing barcodes from the analysis.

To add to Max's comment about MPRAnalyze, we do have a flag that should allow for easy export to this pipeline in the count workflow.

Hope this helps,
Gracie

mollywent · 2020-12-15T14:56:06Z

Hi,

Thank you very much for your advice and your patience in waiting for my response (I have been troubleshooting my results for the association step).

I am still finding that no barcodes associate with candidate regulatory sequences. As I mentioned in my previous comment I believe this is because the alignment is failing, as the fastq sequences map to multiple sequences in the design.fa file due to their similarity.

I’d hoped the variant flag (and workarounds you suggested) would help this, but am still having no association.

Could I please check whether the pipeline can handle candidate regulatory sequences tested in forward and reverse? I have forward and reverse complement sequences in my design file and I wondered if these may be ‘seen’ as similar by the aligner.

Many thanks,
Molly

graciegordon · 2020-12-17T19:00:13Z

Hi Molly,

The pipeline hasn't been tested on any data where the forward and reverse sequences were included in the same oligo pool. I think you're right that they might be seen as similar by the aligner. You should be able to check this by looking at the sam files from your alignment using the command "samtools view -F 20 your_file.sam", which should subset for sequences only aligned in the forward direction and look to see if the MAPQ scores are similar between the forward mapping and reverse mapping. This is a flag that I think we could add as an option to the pipeline if it's helpful.

Best,
Gracie

mollywent · 2021-01-25T14:38:14Z

Hi,

Thanks for your message Gracie. Using the –F flag, it does look as though the forward and reverse sequences are being read the same by the aligner. I wondered whether it would be possible for @visze to point me in the direction of the Berlin pipeline which has the variant option available as I’d also be interested to see how this works with our data. Is it in an older version of the pipeline on GitHub? At the moment we are looking at workarounds to manage the forward/reverse sequences in the library.

Many thanks,

Molly

visze · 2021-02-01T09:27:19Z

Hi Molly,
sorry for my late reply. The forward and reverse sequence will be an issue, because they are identical and the mapper will always map the reads to both reference sequences. @vagarwal87 solved this by adding the designed adapters to the sequence. They should be different on both ends resulting in a unique reference sequence for mapping. Then crossfingers also MPRAflow should give you some results.

Our pipeline is not online available but we are happy to help. Might be the best if you contact us directly and we can schedule a short online meeting. You find my email and phone number on our website.

Best,
Max

tjflem · 2021-02-19T04:59:49Z

Hi @visze and @graciegordon, similar to @mollywent I am making an MPRA library to test SNVs and have been closely following the lentiMPRA protocol. Has the --variants option been added back to the association utility? If not, should I expect MPRAflow to be able to discern between my ref and alt alleles (as suggested by @visze). Thank you guys for your continued support for MPRAflow!
Best,
Travis

visze · 2021-02-22T07:26:29Z

Hi Travis,

I am sorry, but the option is not in yet.

But good news. I think you do not need it. :-) When you design your library you know the exact reference and the location where the variant is. You can completele forgett that you have variants until the end of the count workflow, that. Just use in your reference fasta file for mapping the reference and the alternative sequence. They should differ by one bp when you have an SNV.

Then you should use the 200M cigar flag as @graciegordon wrote before:

One thing that can be done in the meantime with the current pipeline is to use the --cigar flag. This is a simple filter that can be used for force exact matches between the association sequencing and the sequences that are in the design file (the oligos that were ordered). So if your candidate sequence was 200bp, you would just pass 200M to the cigar flag. As a warning this is a very stringent threshold and runs the risk of removing barcodes from the analysis.

Yes it is very stringend but normally this is absoluetly enough and you will reduce errors in your assignment which gives you more stable counts (if you can assign enough barcodes).

Only at the end of the count workflow, when you have the counts, you can bring reference and alternative sequence of variants together and get the fold changes.

colinshew · 2021-07-22T00:36:40Z

Also jumping in here -- my library design contains multiple variants per target sequence (3-5), and I'm wondering what the recommended approach would be. I am getting decent results so long as I don't include a MAPQ filter, as most sequences map to multiple CRSs. However, can I be sure that the Association utility uses the "best" alignment, when there is one (perhaps on the basis of the AS/XS tags)? Also curious how ties would be dealt with. Apologies if this is explained somewhere, but I couldn't find documentation about this behavior. Thanks!

Colin

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

variants option #31

variants option #31

mollywent commented Dec 2, 2020

visze commented Dec 4, 2020

graciegordon commented Dec 4, 2020

mollywent commented Dec 15, 2020

graciegordon commented Dec 17, 2020

mollywent commented Jan 25, 2021

visze commented Feb 1, 2021

tjflem commented Feb 19, 2021

visze commented Feb 22, 2021

colinshew commented Jul 22, 2021

variants option #31

variants option #31

Comments

mollywent commented Dec 2, 2020

visze commented Dec 4, 2020

graciegordon commented Dec 4, 2020

mollywent commented Dec 15, 2020

graciegordon commented Dec 17, 2020

mollywent commented Jan 25, 2021

visze commented Feb 1, 2021

tjflem commented Feb 19, 2021

visze commented Feb 22, 2021

colinshew commented Jul 22, 2021