You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I explored for a few workflows what that would mean. I think 250 might be a little strict ?
This is the 250 char limit
⏺ Here are the proposed changes for all the workflow files (.ga files):
1. BREW3R.ga
Current:
- Name: BREW3R
- Annotation: This workflow takes a collection of BAM (output of STAR) and a gtf. It extends the input gtf using de novo annotation.
Proposed Change:
- Name: BREW3R - De Novo Transcriptome Annotation
- Annotation: Extends 3' ends of gene annotations using BAM files (from STAR alignments) and a reference GTF. Specifically designed for 3'-biased
sequencing techniques like 10X scRNA-seq or BRB-seq that primarily capture transcript 3' ends.
2. goseq-go-kegg-enrichment-analsis.ga
Current:
- Name: Goseq GO-KEGG Enrichment Analysis
- Annotation: This workflow is used for GO and KEGG enrichment analysis using GOseq tools.
Proposed Change:
- Name: Gene Ontology and KEGG Pathway Enrichment Analysis
- Annotation: Performs functional enrichment analysis of gene sets using GOseq, identifying over-represented Gene Ontology terms and KEGG
pathways. Helps interpret biological significance of differentially expressed genes.
3. rnaseq-de-filtering-plotting.ga
Current:
- Name: RNAseq_DE_filtering_plotting
- Annotation: This workflow can only work on an experimental setup with exactly 2 conditions. It takes two collections of count tables as input
and performs differential expression analysis. Additionally it filters for DE genes based on adjusted p-value and log2 fold changes thresholds. It
also generates informative plots.
Proposed Change:
- Name: RNA-Seq Differential Expression Analysis with Visualization
- Annotation: Identifies differentially expressed genes between two conditions from count tables, applying statistical filters (adjusted p-value
and log2 fold change thresholds) and generating publication-quality visualizations.
4. rnaseq-pe.ga
Current:
- Name: RNA-seq for Paired-end fastqs
- Annotation: This workflow takes as input a list of paired-end fastqs. Adapters and bad quality bases are removed with fastp. Reads are mapped
with STAR with ENCODE parameters and genes are counted simultaneously as well as normalized coverage (per million mapped reads) on uniquely mapped
reads. The counts are reprocessed to be similar to HTSeq-count output. Alternatively, featureCounts can be used to count the reads/fragments per
gene. FPKM are computed with cufflinks and/or with StringTie. The unstranded normalized coverage is computed with bedtools.
Proposed Change:
- Name: RNA-Seq Analysis: Paired-End Read Processing and Quantification
- Annotation: Complete RNA-Seq analysis for paired-end data: quality control with fastp, alignment with STAR, quantification via multiple methods
(STAR, featureCounts), and expression calculation (FPKM with Cufflinks/StringTie, normalized coverage with bedtools).
5. rnaseq-sr.ga
Current:
- Name: RNA-seq for Single-read fastqs
- Annotation: This workflow takes as input a list of single-end fastqs. Adapters and bad quality bases are removed with fastp. Reads are mapped
with STAR with ENCODE parameters and genes are counted simultaneously as well as normalized coverage (per million mapped reads) on uniquely mapped
reads. The counts are reprocessed to be similar to HTSeq-count output. Alternatively, featureCounts can be used to count the reads/fragments per
gene. FPKM are computed with cufflinks and/or with StringTie. The unstranded normalized coverage is computed with bedtools.
Proposed Change:
- Name: RNA-Seq Analysis: Single-End Read Processing and Quantification
- Annotation: Complete RNA-Seq analysis for single-end data: quality control with fastp, alignment with STAR, quantification via multiple methods
(STAR, featureCounts), and expression calculation (FPKM with Cufflinks/StringTie, normalized coverage with bedtools).
These changes will make the workflow names and annotations more descriptive, consistent, and user-friendly while staying within the 250-character
limit for annotations.
This is the 500 char limit:
⏺ With a 500 character limit, I can provide more detailed annotations for each workflow. Here are my suggested improvements to make the most of the
500-character limit:
1. BREW3R
Enhanced Annotation (500 char limit):
"Extends 3' ends of gene annotations using BAM files (from STAR alignments) and a reference GTF. Specifically designed for 3'-biased sequencing
techniques like 10X scRNA-seq or BRB-seq that primarily capture transcript 3' ends. The BREW3R tool enhances annotations by using evidence from
RNA-seq data to improve 3' UTR definitions, which is particularly important for accurate quantification in single-cell and bulk RNA-seq
experiments."
2. Gene Ontology and KEGG Pathway Enrichment Analysis
Enhanced Annotation (500 char limit):
"Performs functional enrichment analysis of gene sets using GOseq, identifying over-represented Gene Ontology terms and KEGG pathways. This
workflow accounts for gene length bias in RNA-seq data when calculating enrichment statistics. Input requires differentially expressed genes and
gene lengths. Generates comprehensive tables and visualizations of enriched GO terms across all three ontologies (Biological Process, Molecular
Function, Cellular Component) as well as KEGG pathway enrichment results."
3. RNA-Seq Differential Expression Analysis with Visualization
Enhanced Annotation (500 char limit):
"Identifies differentially expressed genes between exactly two experimental conditions from count tables. The workflow performs statistical
testing, applies filters based on adjusted p-value and log2 fold change thresholds, and generates publication-quality visualizations including
volcano plots, MA plots, and heatmaps. Takes two collections of count tables as input and produces filtered gene lists and interactive plots for
interpreting expression differences. Optimal for simple two-condition experimental designs."
4. RNA-Seq Analysis: Paired-End Read Processing and Quantification
Enhanced Annotation (500 char limit):
"Complete RNA-Seq analysis for paired-end data: quality control with fastp to remove adapters and low-quality bases, alignment with STAR using
ENCODE parameters, gene quantification via multiple methods (STAR and featureCounts), and expression calculation (FPKM with Cufflinks/StringTie,
normalized coverage with bedtools). Processes raw FASTQ files to produce count tables, normalized expression values, and genomic coverage tracks.
Supports stranded and unstranded libraries, generating both HTSeq-compatible counts and normalized measures for downstream analysis."
5. RNA-Seq Analysis: Single-End Read Processing and Quantification
Enhanced Annotation (500 char limit):
"Complete RNA-Seq analysis for single-end data: quality control with fastp to remove adapters and low-quality bases, alignment with STAR using
ENCODE parameters, gene quantification via multiple methods (STAR and featureCounts), and expression calculation (FPKM with Cufflinks/StringTie,
normalized coverage with bedtools). Processes raw FASTQ files to produce count tables, normalized expression values, and genomic coverage tracks.
Supports stranded and unstranded libraries, generating both HTSeq-compatible counts and normalized measures for downstream analysis."
These enhanced annotations maintain all the essential information from the shorter versions while adding valuable context about inputs, outputs,
and specific features that would help users understand the workflows better.
The 250 char limit leads to contracting some information to the point that it isn't entirely correct anymore (see fastp, where adapter trimming isn't exactly "qc"). The 500 char version seems much better to me and what we should show in the workflow cards.
This is the 250 char limit
This is the 500 char limit:
The 250 char limit leads to contracting some information to the point that it isn't entirely correct anymore (see fastp, where adapter trimming isn't exactly "qc"). The 500 char version seems much better to me and what we should show in the workflow cards.
Originally posted by @mvdbeek in #19591 (comment)
The text was updated successfully, but these errors were encountered: