Skip to content

Commit

Permalink
Standardize csvtk and tk-utils usage [#23]
Browse files Browse the repository at this point in the history
* Wrap tsv-utils usage in `csv2tk --csv-delim $'\t'` /
  `csvtk fix-quotes --tabs`
* Remove `csvtk fix-quotes` at start of pipeline in
  "format_ncbi_dataset_report" rule
* Remove '-l' flag in "format_ncbi_dataset_report" rule
  • Loading branch information
genehack committed Dec 19, 2024
1 parent a08660b commit eb3b0a5
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 7 deletions.
5 changes: 3 additions & 2 deletions ingest/rules/curate.smk
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,7 @@ rule subset_curated_metadata_columns:
metadata_fields=",".join(config["curate"]["metadata_columns"]),
shell:
r"""
tsv-select -H -f {params.metadata_fields} \
{input.metadata} > {output.metadata}
csvtk cut -t -f {params.metadata_fields} \
{input.metadata} \
> {output.metadata}
"""
8 changes: 3 additions & 5 deletions ingest/rules/fetch_from_ncbi.smk
Original file line number Diff line number Diff line change
Expand Up @@ -89,12 +89,10 @@ rule format_ncbi_dataset_report:
--fields {params.ncbi_datasets_fields:q} \
--elide-header \
| csvtk fix-quotes -Ht \
| csvtk add-header -t -l -n {params.ncbi_datasets_fields:q} \
| csvtk add-header -t -n {params.ncbi_datasets_fields:q} \
| csvtk rename -t -f accession -n accession_version \
| csvtk -t mutate -f accession_version -n accession -p "^(.+?)\." \
| csvtk del-quotes -t \
| tsv-select -H -f accession --rest last \
> {output.ncbi_dataset_tsv}
| csvtk -t mutate -f accession_version -n accession -p "^(.+?)\." --at 1 \
> {output.ncbi_dataset_tsv}
"""


Expand Down

0 comments on commit eb3b0a5

Please sign in to comment.