Skip to content

Commit

Permalink
Merge pull request #130 from reichan1998/cram_handling
Browse files Browse the repository at this point in the history
Implement cram chunking for PacBio and Nanopore
  • Loading branch information
tkchafin authored Oct 24, 2024
2 parents 51a1d9f + 3cd2c01 commit 90910b7
Show file tree
Hide file tree
Showing 29 changed files with 688 additions and 372 deletions.
88 changes: 0 additions & 88 deletions .github/workflows/download_pipeline.yml

This file was deleted.

2 changes: 1 addition & 1 deletion .github/workflows/linting.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jobs:
- uses: actions/setup-node@v3

- name: Install editorconfig-checker
run: npm install -g editorconfig-checker
run: npm install -g editorconfig-checker@3.0.2

- name: Run ECLint check
run: editorconfig-checker -exclude README.md $(find .* -type f | grep -v '.git\|.py\|.md\|json\|yml\|yaml\|html\|css\|work\|.nextflow\|build\|nf_core.egg-info\|log.txt\|Makefile')
Expand Down
12 changes: 12 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,18 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [[1.3.1](https://github.com/sanger-tol/readmapping/releases/tag/1.3.0)] - Antipodean Opaleye (patch 1) - [2024-09-24]

### Enhancements & fixes

- Fixed bug in handling CRAM HiC inputs introduced in 1.1.0
- Fixed bug in handling PacBio FASTQ inputs introduced in 1.3.0

| Dependency | Old version | New version |
| ---------- | ----------- | ----------- |
| `bbtools` | | 39.01 |
| `seqtk` | 1.4 | |

## [[1.3.0](https://github.com/sanger-tol/readmapping/releases/tag/1.3.0)] - Antipodean Opaleye - [2024-08-23]

### Enhancements & fixes
Expand Down
12 changes: 6 additions & 6 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@
## Pipeline tools

- [BBTools](http://sourceforge.net/projects/bbmap/)

> Bushnell B. BBTools software package. 2014. http://sourceforge.net/projects/bbmap/
- [Blast](https://pubmed.ncbi.nlm.nih.gov/20003500/)

> Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. BLAST+: architecture and applications. BMC Bioinformatics. 2009 Dec 15;10:421. doi: 10.1186/1471-2105-10-421. PMID: 20003500; PMCID: PMC2803857.
Expand All @@ -18,7 +22,7 @@

> Vasimuddin Md, Misra S, Li H, Aluru S. Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems. 2019 IEEE International Parallel and Distributed Processing Symposium. 2019 May;314–24. doi: 10.1109/IPDPS.2019.00041.
- [CRUMBLE]
- [CRUMBLE](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6330002/)

> Bonfield JK, McCarthy SA, Durbin R. Crumble: reference free lossy compression of sequence quality values. Bioinformatics. 2019 Jan;35(2):337-339. doi: 10.1093/bioinformatics/bty608. PubMed PMID: 29992288; PMCID: PMC6330002.
Expand All @@ -30,14 +34,10 @@

> Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM, Li H. Twelve years of SAMtools and BCFtools. Gigascience. 2021 Feb 16;10(2):giab008. doi: 10.1093/gigascience/giab008. PMID: 33590861; PMCID: PMC7931819.
- [SeqKit]
- [SeqKit](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5051824/)

> Shen W, Le S, Li Y, Hu F. SeqKit: A cross-platform and ultrafast toolkit for FASTA/Q file manipulation. PLoS One. 2016 Oct 5;11(10):e0163962. doi: 10.1371/journal.pone.0163962. PubMed PMID: 27706213; PMCID: PMC5051824.
- [Seqtk]

> Li H. Toolkit for processing sequences in FASTA/Q formats. GitHub Repository. 2012. https://github.com/lh3/seqtk. Accessed August 2024.
## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)
Expand Down
4 changes: 3 additions & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
MIT License

Copyright (c) @priyanka-surana
Copyright (c) 2022-2024 Genome Research Ltd.
except `bin/filter_five_end.pl`:
Copyright (c) 2017 Arima Genomics, Inc.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
18 changes: 17 additions & 1 deletion conf/base.config
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ process {
// pipeline to self-heal from MEMLIMIT/RUNLIMIT.

// Default
cpus = 1
cpus = { check_max( 1 * task.attempt, 'cpus' ) }
memory = { check_max( 50.MB * task.attempt, 'memory' ) }
time = { check_max( 30.min * task.attempt, 'time' ) }

Expand All @@ -41,6 +41,11 @@ process {
memory = { check_max( ((meta.datatype == "pacbio_clr" || meta.datatype == "ont") ? 2.GB : 1.GB) * task.attempt, 'memory' ) }
}

// minimum 1GB memory
withName: 'BBMAP_FILTERBYNAME' {
memory = { check_max( 1.GB * task.attempt, 'memory' ) }
}

withName: 'SAMTOOLS_COLLATETOFASTA' {
cpus = { log_increase_cpus(4, 2*task.attempt, 1, 2) }
memory = { check_max( 1.GB * Math.ceil( meta.read_count / 1000000 ) * task.attempt, 'memory' ) }
Expand All @@ -58,6 +63,12 @@ process {
time = { check_max( 2.h * Math.ceil( meta.read_count / 100000000 ) * task.attempt / log_increase_cpus(2, 6*task.attempt, 1, 2), 'time' ) }
}

withName: SAMTOOLS_ADDREPLACERG {
cpus = { log_increase_cpus(2, 6*task.attempt, 1, 2) }
memory = { check_max( 4.GB + 850.MB * log_increase_cpus(2, 6*task.attempt, 1, 2) * task.attempt + 0.6.GB * Math.ceil( meta.read_count / 100000000 ), 'memory' ) }
time = { check_max( 2.h * Math.ceil( meta.read_count / 100000000 ) * task.attempt / log_increase_cpus(2, 6*task.attempt, 1, 2), 'time' ) }
}

withName: BLAST_BLASTN {
time = { check_max( 2.hour * Math.ceil( meta.read_count / 1000000 ) * task.attempt, 'time' ) }
memory = { check_max( 100.MB + 20.MB * Math.ceil( meta.read_count / 1000000 ) * task.attempt, 'memory' ) }
Expand Down Expand Up @@ -109,6 +120,11 @@ process {
memory = { check_max( 1.GB * Math.ceil( 30 * fasta.size() / 1e+9 ) * task.attempt, 'memory' ) }
}

withName: GENERATE_CRAM_CSV {
cpus = { check_max( 4 * task.attempt, 'cpus' ) }
memory = { check_max( 16.GB * task.attempt, 'memory' ) }
}

withName: CRUMBLE {
// No correlation between memory usage and the number of reads or the genome size.
// Most genomes seem happy with 1 GB, then some with 2 GB, then some with 5 GB.
Expand Down
37 changes: 23 additions & 14 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,10 @@ process {
ext.args = '-F 0x200 -nt'
}

withName: BBMAP_FILTERBYNAME {
ext.args = 'include=f'
}

withName: SAMTOOLS_MERGE {
beforeScript = { "export REF_PATH=spoof"}
ext.args = { "-c -p" }
Expand Down Expand Up @@ -46,14 +50,6 @@ process {
ext.args = "--output-fmt cram"
}

withName: '.*:.*:ALIGN_HIC:BWAMEM2_MEM' {
ext.args = { "-5SPCp -R ${meta.read_group}" }
}

withName: '.*:.*:ALIGN_ILLUMINA:BWAMEM2_MEM' {
ext.args = { "-p -R ${meta.read_group}" }
}

withName: ".*:ALIGN_ILLUMINA:.*:CRAM_FILTER_ALIGN_BWAMEM2_FIXMATE_SORT" {
ext.args = ""
ext.args1 = { "-F 0x200 -nt" }
Expand Down Expand Up @@ -95,16 +91,29 @@ process {
// NOTE: minimap2 uses the decimal system ! 1G = 1,000,000,000 bp
// NOTE: Math.ceil returns a double, but fortunately minimap2 accepts floating point values.
// NOTE: minimap2 2.25 raises the default to 8G, which means higher memory savings on smaller genomes
withName: '.*:.*:ALIGN_HIFI:MINIMAP2_ALIGN' {
ext.args = { "-ax map-hifi --cs=short -R ${meta.read_group} -I" + Math.ceil(meta2.genome_size/1e9) + 'G' }

withName: ".*:ALIGN_HIFI:.*:CRAM_FILTER_MINIMAP2_FILTER5END_FIXMATE_SORT" {
ext.args = ""
ext.args1 = { "-F 0x200 -nt" }
ext.args2 = { "-ax map-hifi --cs=short -I" + Math.ceil(meta.genome_size/1e9) + 'G' }
ext.args3 = "-mpu"
ext.args4 = { "--write-index -l1" }
}

withName: '.*:.*:ALIGN_CLR:MINIMAP2_ALIGN' {
ext.args = { "-ax map-pb -R ${meta.read_group} -I" + Math.ceil(meta2.genome_size/1e9) + 'G' }
withName: ".*:ALIGN_CLR:.*:CRAM_FILTER_MINIMAP2_FILTER5END_FIXMATE_SORT" {
ext.args = ""
ext.args1 = { "-F 0x200 -nt" }
ext.args2 = { "-ax map-pb -I" + Math.ceil(meta.genome_size/1e9) + 'G' }
ext.args3 = "-mpu"
ext.args4 = { "--write-index -l1" }
}

withName: '.*:.*:ALIGN_ONT:MINIMAP2_ALIGN' {
ext.args = { "-ax map-ont -R ${meta.read_group} -I" + Math.ceil(meta2.genome_size/1e9) + 'G' }
withName: ".*:ALIGN_ONT:.*:CRAM_FILTER_MINIMAP2_FILTER5END_FIXMATE_SORT" {
ext.args = ""
ext.args1 = { "-F 0x200 -nt" }
ext.args2 = { "-ax map-ont -I" + Math.ceil(meta.genome_size/1e9) + 'G' }
ext.args3 = "-mpu"
ext.args4 = { "--write-index -l1" }
}

withName: '.*:CONVERT_STATS:SAMTOOLS_CRAM' {
Expand Down
10 changes: 5 additions & 5 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,11 @@
"https://github.com/nf-core/modules.git": {
"modules": {
"nf-core": {
"bbmap/filterbyname": {
"branch": "master",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"]
},
"blast/blastn": {
"branch": "master",
"git_sha": "583edaf97c9373a20df05a3b7be5a6677f9cd719",
Expand Down Expand Up @@ -91,11 +96,6 @@
"git_sha": "03fbf6c89e551bd8d77f3b751fb5c955f75b34c5",
"installed_by": ["modules"]
},
"seqtk/subseq": {
"branch": "master",
"git_sha": "730f3aee80d5f8d0b5fc532202ac59361414d006",
"installed_by": ["modules"]
},
"untar": {
"branch": "master",
"git_sha": "4e5f4687318f24ba944a13609d3ea6ebd890737d",
Expand Down
5 changes: 5 additions & 0 deletions modules/nf-core/bbmap/filterbyname/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

71 changes: 71 additions & 0 deletions modules/nf-core/bbmap/filterbyname/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 90910b7

Please sign in to comment.