-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIRE on targeted seq data: fiber-locations-shuffled.bed.gz is created empty #7
Comments
given how you set up your exclude I think you can drop the This might fix the issue, but with target data the coverage calculations and filters might be removing everything. For example, I am not sure if you are getting even coverage between targets. If not, this could make for some strict coverage filters. You can find files with the filters applied in |
I should note more generally I did not build this with targeted data in mind. That being said, @StephanieBohaczuk might have some advice on making a config file since I think she has done this successfully before? |
@mrvollger May I ask how should I specify to keep multiple chromosomes like chr4, chr7 and chr20 using the "keep_chromosomes:"? Thanks! |
@Strausyatina @mrvollger For targeted data, I usually first filter the bam for location(s) I'm interested in with
P.S. Sorry for the late reply. This went to a strange section of my inbox and I just saw it. |
@StephanieBohaczuk Thank you a lot for your reply! Are you using the latest FIRE release - version 0.0.3? I noticed that the previous release, 0.0.2, did not have the options for min_coverage, coverage_within_n_sd, and min_per_acc_peak in the config.yaml file. We are having difficulty achieving enough coverage of targeted-seq with the PacBio platform, but we have over 600X coverage with ONT data. So I'm trying to run FIRE on ONT data. I tried the following configurations:
It can find fibers across the genome, estimate median coverage, but still breaks down on the performance of the coverage rule. It feels like the script keeps counting coverage in the regions that I specified as excluded in the config file (see, config file)
|
Hi @NurislamSheih.
I've been using 0.0.2, but there were a few small updates between 0.0.2 and 0.0.3 that were not tagged as versions. Here's the exact version:
Sorry, but I've never tried using the exclude regions. I thought that was just for training, but @mrvollger would be able to clarify. Instead, I typically just filter my bam for regions of interest with samtools (see my comment above for more details) and then just run the filtered bam through the pipeline. |
Exclude is only used in making the FDR null regions and not in calculating initial coverage profiles. ( Did FIRE actually call anything for you? i.e. is maybe the FIRE bam empty?) I am/was not planning on allowing for targeted data in FIRE, but after discussing with @StephanieBohaczuk I am seeing the utility even though the FDR calculation won't work well. I'd be happy to review a PR that adds the ability to accept a bed file of target regions to narrow the analysis. If you don't want to tackle this and want me to do this, I'll need a minimal targeted test case (small bam and small genome) that recreates the issues you have been having that I can run in a few minutes. And then some patience since I won't be able to get to this right away. |
Hi Mitchell!
We've tried to run FIRE on targeted seq data, and pipeline is failing with "polars.exceptions.NoDataError: empty CSV", since fiber-locations-shuffled.bed.gz is created empty.
The bed file with complement to targeted regions was used for exclusion in filtered_and_shuffled_fiber_locations_chromosome.
What could be an issue in our usage of FIRE? Is it suitable for such a task?
Config yaml:
Example of error log:
Exclusion bed file:
The text was updated successfully, but these errors were encountered: