-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions About MPRAflow - Low CRS-BC Associations and Sequencing Details #85
Comments
MPRAflow has known issuews when edit distances between oligos are small. The mapper (bwa) seems to have issues assigning the reads to the "BEST" oligo and sees both oligos (e.g. reference and alternative when you have a variant) as best hits. Therefore discards it. A replacement would be MPRAsnakeflow using the bbmap strategy which is optimized for variants (or short edit distances). Your assignment reads have to overlap! We are using the tool fastq-join here. You can run it by yourself to see you many reads you are able to merge. Here is the line of code: Line 331 in 8874270
Your low number of BCs can be due to the low number of merged reads. There are two solutions.
But you are also saying you have variants. So if the variant is in the center then this approach will totally fail because you will generate errors in the center of the oligos. When having variants in the center you really need overlapping PE sequences. |
Thank you so much for taking the time to provide such detailed feedback and for introducing me to MPRAsnakeflow. I would like to update you on my recent progress. Initially, my MPRAflow program encountered an error:
Based on the error message, I suspected that some of my sequences might be missing quality information. However, after checking all the sequences, I found that they all had quality scores. So, initially, I simply modified the MPRAflow/src/nf_ori_map_barcodes.py file at line 117:
I added a condition to handle missing quality scores:
After this change, the code ran normally and completed successfully. However, the result was the one I sent you via email. Then, when I ran the Regardless, I just wanted to share the issues I’ve encountered and hope that this feedback might help improve MPRAflow. Thanks again for your enthusiastic help and guidance, as well as for developing the amazing tool! |
I want to sincerely thank you for developing the lentiMPRA protocol and the MPRAflow tool. This is truly an outstanding piece of work, and it has been incredibly helpful to the field of high-throughput functional genomics. I deeply appreciate the effort and expertise that went into creating such a comprehensive framework.
Recently, I followed your experimental protocol for performing lentiMPRA, and I am now using MPRAflow for analysis. However, I have encountered some issues in the Association step that I would like to ask for your advice on. Specifically, I am observing a surprisingly low number of CRS-BC (candidate regulatory sequence - barcode) associations.
Here are the details:
• In my experimental design, each CRS was assigned 185 barcodes. I have a total of 7,530 CRS, and the sequencing depth I used was 19 million reads (>185 * 7,530 * 10).
• Despite this, the number of associated CRS and barcodes is very low. Even before filtering, the median number of barcodes per CRS is far below 20, and after filtering, the associations drop even further.
• Some of my CRS sequences only differ by a single nucleotide, so I set the mapq parameter to 0 in an attempt to increase associations. However, this setting seemed to have no impact, as the output remained the same.
I am quite puzzled by these results and was wondering if you could help me identify the potential cause. Could this issue arise from my parameter settings, or is it more likely to be a problem with my experimental design or sequencing?
Additionally, I have one more question about the Association step:
•In the sequencing files for this step, the R1.fq.gz file contains the first half of each CRS sequence, while R2.fq.gz contains the second half. How does the program associate barcodes with CRS sequences in this case? I would have expected that at least one read would contain both the barcode and the CRS for the association to be made.
Thank you so much for your time and assistance. I greatly appreciate your help, and I look forward to hearing your insights.
The text was updated successfully, but these errors were encountered: