-
-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider making handling of CR LF newlines more consistent with Gawk #51
Comments
I've thought about this a bit more, and I prefer the GoAWK behavior here, so I'm going to stick with it for now. Including the CR in the field seems against the spirit of |
I am confused by this spec of goawk. With the exception of goawk, other awk implementations are consistent in their handling of newline characters. (Testing is done on Ubuntu 20.04) $ printf "A\r\nB\rC\nD" | goawk 'BEGIN{RS="\n"} {printf $0}' | hexdump -C
00000000 41 42 0d 43 44 |AB.CD|
$ printf "A\r\nB\rC\nD" | mawk 'BEGIN{RS="\n"} {printf $0}' | hexdump -C
00000000 41 0d 42 0d 43 44 |A.B.CD|
$ printf "A\r\nB\rC\nD" | gawk 'BEGIN{RS="\n"} {printf $0}' | hexdump -C
00000000 41 0d 42 0d 43 44 |A.B.CD|
$ printf "A\r\nB\rC\nD" | busybox awk 'BEGIN{RS="\n"} {printf $0}' | hexdump -C
00000000 41 0d 42 0d 43 44 |A.B.CD|
$ printf "A\r\nB\rC\nD" | original-awk 'BEGIN{RS="\n"} {printf $0}' | hexdump -C
00000000 41 0d 42 0d 43 44 |A.B.CD| If you prefer the GoAWK behavior, how about setting the default value of $ printf "A\r\nB\rC\nD" | goawk 'BEGIN{RS="\r?\n"} {printf $0}' | hexdump -C
00000000 41 42 0d 43 44 |AB.CD|
$ printf "A\r\nB\rC\nD" | mawk 'BEGIN{RS="\r?\n"} {printf $0}' | hexdump -C
00000000 41 42 0d 43 44 |AB.CD|
$ printf "A\r\nB\rC\nD" | gawk 'BEGIN{RS="\r?\n"} {printf $0}' | hexdump -C
00000000 41 42 0d 43 44 |AB.CD|
$ printf "A\r\nB\rC\nD" | busybox awk 'BEGIN{RS="\r?\n"} {printf $0}' | hexdump -C
00000000 41 42 0d 43 44 |AB.CD|
# See POSIX documentation below (nawk: awk version 20121220)
$ printf "A\r\nB\rC\nD" | original-awk 'BEGIN{RS="\r?\n"} {printf $0}' | hexdump -C
00000000 41 0a 42 43 0a 44 |A.BC.D|
# It is fixed in the on macOS 11.6.5 version of nawk (nawk: awk version 20200816)
$ printf "A\r\nB\rC\nD" | /usr/bin/awk 'BEGIN{RS="\r?\n"} {printf $0}' | hexdump -C
00000000 41 42 0d 43 44 |AB.CD| https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html
In my opinion, portability is important. And in any case, we need a way to treat |
Thanks, I'm going to reopen this issue to revisit this. |
I don't want to have to care whether input text comes with \n or \r\n at the end of lines. And goawk makes this dream come true. With normal awk I can get code working on a unix-like system, deploy it to Windows (or process a file from Windows) and watch it crash and burn. Having to remember to say BEGIN{RS="\r?\n"} in every script is not a good solution. |
Per discussion on issue #33 (from here down), GoAWK handles CR LF (Windows) line endings differently from gawk (I haven't tried awk or mawk). GoAWK doesn't include the CR in the field (because it's part of the line ending), whereas Gawk does. I'm not sure if there are differences between Gawk's handling on Windows and Linux.
I kinda think the GoAWK approach is more sensible and platform-native, but consistency with other AWKs is good too ... worth thinking about further.
Arnold Robbins said this:
The text was updated successfully, but these errors were encountered: