-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-identical length distribution #42
Comments
@schorlton I personally am not sure which I think is "better". More info is rarely bad. But I'm definitely interested in your opinion on "better" in a general sense. Input is always appreciated! We will definitely consider any suggested change or enhancement. |
Thanks for quick response! I tend to agree that more info is better. However, this is a somewhat breaking change for use with MultiQC (which I expect many falco users also use). What I would possibly suggest is PR to MultiQC to smooth the line or format this plot as a bar graph (basically a very granular histogram) instead of line graph. Hard to tell what it would look like before implemented, and it would need to work with both tools, but It seems the trend is more important than the individual sizes. Definitely between 0 and ~7500bp on the plot above the line is too thick to be useful. Alternative would be to reproduce FastQC behaviour, or something closer to it than bin size of 1 for read length distribution? |
We'll see how to take a first stab at this and leave this issue open until we can say something on it. |
Hello, When making the sequence length module analysis I had previously made a somewhat executive decision to not group it, because I assume that, in any long read dataset (where this module is often relevant), the number of reads would never generate gigantic bar plots. That said, this was a bad decision. It's not our call to decide on the behavior of the module, but rather to emulate it faithfully. I'll work on creating base groups for this module. It will be disabled if If I may ask: I'm very curious about additional insights that MultiQC provides that is not already available on falco's HTML output? One of our goals in making falco was modernizing the FastQC plots, which I believe is similar to what MultiQC provides. In that spirit, the falco HTML plots for sequence lengths are bar plots, like you suggested (and I fully agree). Is MultiQC advantageous in this case because you can merge QC metrics for multiple datasets? Or create customized tools for additional summary statistics beyond what FastQC provides? |
Hi @guilhermesena1, sorry for the very late reply and thank you for your input and work on Falco. I look forward to a solution that facilitates integration with MultiQC. MultiQC aggregates reports from many tools which go well beyond read quality control (see MultiQC Modules). The functionality of MultiQC and HTML reports from FastQC/Falco are hard to compare as they are different in aim and scope. Thanks again! |
Same file. Running falco v1.2.1 from bioconda and MultiQC 1.12. Can reproduce by running on nanopore data from SRA with long read lengths.
MultiQC report of FastQC:
MultiQC report of falco:
I believe falco calculates length distribution for every length, while FastQC creates a histogram in fastqc_data.txt. Which is better? The granularity and detail is nice, but it can also obscure plotting. Should falco reproduce FastQC behaviour or perform some kind of binning of read lengths? Interested in your thoughts.
The text was updated successfully, but these errors were encountered: