Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to print the heading of a tsv when using -H argument? #245

Open
AnabasisXu opened this issue Nov 10, 2024 · 1 comment
Open

How to print the heading of a tsv when using -H argument? #245

AnabasisXu opened this issue Nov 10, 2024 · 1 comment

Comments

@AnabasisXu
Copy link

AnabasisXu commented Nov 10, 2024

First, thank you for developing 'GoAWK' with great csv/tsv support!

I am trying to implement the filter command of Miller with the -H argument of GoAWK:

mlr --icsv --opprint filter '$color == "red"' example.csv
color shape flag k index quantity rate
red square true 2 15 79.2778 0.0130
red circle true 3 16 13.8103 2.9010
red square false 4 48 77.5542 7.4670
red square false 6 64 77.1991 9.5310

https://miller.readthedocs.io/en/latest/10min/

With -H argument, NR==1 means the second line in GoAWK.

❯ goawk -itsv -H 'NR == 1 {print $0} ' 3.tsv
珊瑚    5

❯ cat 3.tsv
品种    价格
珊瑚    5
沙漠王  3

Using FIELDS does not seem to make it easier:

❯ goawk -i tsv -H '{ for (i=1; i in FIELDS; i++) printf "%s%s", FIELDS[i], (i < length(FIELDS) ? OFS : ORS); exit }' 3.tsv
品种 价格

Maybe just combine head -n 1 3.tsv with goawk -itsv -H 'NR == 1 {print $0} ' 3.tsv? It would mean two processes instead of one.

My current working function that implements the filter command of mlr:

function gtf() {
  # not sure how to do it with -H
  goawk -itsv 'NR == 1 {print $0}' "$3" 
  # filter lines
  goawk -itsv -H "@\"$1\" == $2 {print \$0}" "$3"
}

❯ gtf 价格 5 3.tsv
品种    价格
珊瑚    5
达尔奥索        5

❯ mlr --itsv --opprint filter '$价格==5' 3.tsv
品种   价格
珊瑚   5
达尔奥索 5
@benhoyt
Copy link
Owner

benhoyt commented Nov 11, 2024

Yeah, this isn't the easiest with GoAWK right now. At present you either have to hard-code the field names in BEGIN:

# hard-code field names in BEGIN
$ goawk -icsv -H 'BEGIN { print "color,shape,flag,k,index,quantity,rate" } @"color" == "red"' example.csv 
color,shape,flag,k,index,quantity,rate
red,square,true,2,15,79.2778,0.0130
red,circle,true,3,16,13.8103,2.9010
red,square,false,4,48,77.5542,7.4670
red,square,false,6,64,77.1991,9.5310

Or define a reusable print_fields function that loops over FIELDS and prints them out in CSV format, and then call that once (you can't call it in BEGIN as FIELDS is not set yet then).

I wouldn't mind designing a better API for this, but I'm not sure what it would look like. With CSV output mode, you can print arbitrary fields, so I'm not sure where the names would come from. Feel free to make suggetsions. See also #127 (but I don't love that API either).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants