-
Notifications
You must be signed in to change notification settings - Fork 2
3. Useful Commands
Liam Bindle edited this page Feb 24, 2022
·
12 revisions
This section has a list of useful bashdatacatalog-list
commands.
$ bashdatacatalog-list -am catalog.csv
$ bashdatacatalog-list -ae catalog.csv
$ bashdatacatalog-list -am -r "2015-01-01,2018-12-31" catalog.csv
$ bashdatacatalog-list -aw -p "file[123]" catalog.csv
Note: It can take a significant amount of time to list wrong files because the checksums need to be calculated. This is why the -p "PATTERN"
argument is often useful with -w
.
$ bashdatacatalog-list -u -r "2015-01-01,2018-12-31" catalog.csv
Note: Unnecessary files are temporal files with a timestamp that falls outside of the provided date range and untracked files.
$ bashdatacatalog-list -am -f xargs-curl catalog.csv | xargs curl
$ bashdatacatalog-list -am -f xargs-curl catalog.csv | xargs -P 4 curl
$ bashdatacatalog-list -am -f url catalog.csv > url_download_list.txt
$ wget -i url_download_list.txt -x -nH -nv --cut-dirs=4 # you will need to modify --cut-dirs=N
$ bashdatacatalog-list -am -f rsync catalog.csv > file_list.txt
$ rsync -av --file-from=file_list.txt user@host:/remote-data-root/ .
$ bashdatacatalog-list -am -f globus="$(pwd),/remote-data-root/" catalog.csv > globus_batch.txt
$ globus transfer --batch globus_batch.txt SOURCE_ENDPOINT_ID DEST_ENDPOINT_ID
$ bashdatacatalog-list -u -r "2015-01-01,2018-12-31" -f xargs-rm catalog.csv | xargs rm
$ bashdatacatalog-list -ae catalog.csv | xargs chgrp groupname
Note: You might need to use sudo xargs chgrp groupname
.
$ bashdatacatalog-execdir 'find -type d -exec chgrp groupname {} \;' catalog.csv
$ bashdatacatalog-execdir 'pwd' catalog.csv | sed "s#$(pwd)/*##g" | awk -F '/' '{for (i=1; i<=NF; i++) { for(j=1; j<=i; ++j) printf "%s/",$j ; printf "\n"} }' | sort | uniq | xargs chgrp groupname
Note: You might need to use sudo chgrp groupname
.
$ bashdatacatalog-list -a -r "2015-01-01,2018-12-31" catalog.csv | sort > tracked_files.txt
$ find -L . -name .asset_patches -prune -o -type f \( ! -name '\.*' \) -print | sort > all_files_in_tree.txt
$ comm -13 tracked_files.txt all_files_in_tree.txt > unnecessary_files.txt
$ cat unnecessary_files.txt | xargs rm # be careful!
$ bashdatacatalog-list -es catalog.csv | xargs stat --printf="%s\n" | awk '{s+=$1} END {print s}'
Consider giving the bashdatacatalog a Star ⭐ if you find it useful. This increase visibility and helps justify maintaining this repository.