-
-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rmk32 eol convention for input defaults to ANY, extend OPENSTREAM so that EOL can be specified as an "external format" #1785
base: master
Are you sure you want to change the base?
Conversation
As per technical meeting on 7/15/2024
… EOL" This reverts commit 6a7e8c3.
(* ; "Edited 6-Jul-2022 00:00 by rmk") | ||
(* ; "Edited 19-Dec-2021 09:30 by rmk") | ||
(* ; "Edited 14-Dec-2021 16:10 by rmk") | ||
(* ; "Edited 13-Dec-2021 15:20 by rmk") | ||
(* ; "Edited 29-Jun-2021 17:07 by rmk:") | ||
(* ; "Edited 5-Oct-92 13:45 by jds") | ||
|
||
(* ;; "RMK: July 2024: Default EOL to ANY on input streams, allow EXTERNAL FORMAT to be a (FORMAT EOL) list so CL:OPEN can get the EOL") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following the principle of "be liberal in what you accept, and conservative in what you generate"; would it make sense for the EXTERNALFORMAT
to be in proplist (:key val ...) format? That would allow the items to be in either order, and would establish the pattern for extending this in the future, if necessary.
Should that generalization be added to the implementation of CL:OPEN
before it calls to IL:OPENSTREAM
? There it could also ensure the EOL
symbols are in the IL:
package, and put the values in the correct order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
COnsider the possibility that this change ahs too much flexibility, and the flexiblity means more error cases. What are the uses of EXTERNAL-FORMAT? When you are copying from one place to another, can you copy butes instead of characters (The ELEMENT-TYPE of Common Lisp streams can be BYTE or CHARACTER.
A simpler to implement and more backward comapatible would be to get rid of EOL as a separate parameter and "bake" it into the EXTERAN-FORMAT keyword:
We currently have :UTF-8 and :XCCS as the two frequent cases.
Declare that UTF-8 implies EOL=LF and add (i you need it) :UTF-8-CR or UTF-8-CRLF.
Declare that XCCS implies EOL=.CR on output and ANY on input.
Then you don't have to edit where any program assumes EQ can be used to answewr whether two streams have the same EXTERNAL-FORMAT which could happen anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to copy bytes, use COPYBYTES. If you want to copy characters, use COPYCHARS (which will convert the bytes from one format to another). Does commonlisp specify a function that branches on the element-type? It should choose which of the subfunctions to call.
Each external format already has its own default EOL convention. This extension is for the case where for whatever reason the user wants to override that. For OPENSTREAM the override can be passed as a separate parameter, but CL:OPEN doesn't allow for that kind of additional specification. This is all about sneaking that in without doing more serious damage.
This doesn't affect what is returned as the external-format STREAMPROP of the stream, it's always an EQ-able atom. It's just that if the EOL convention had been changed from its default, the property in the external format wouldn't be accurate.
In Interlisp the function STREAMPROP can be used to change the format and the eol separately, after the open. Does commonlisp support that kind of operation? (Another use case for STREAMPROP: the ENDOFSTREAMOP as a stream property rather than something that has be specified on each input operation. Does commonlisp support that?)
I probably don't yet have the correct logic for the EOL convention of external formats, as we transition to ANY as the default for input streams. At open the ANY should be installed for input streams even if the format specifies one of the specific conventions. The format's convention should apply by default only to output streams. If the user really wants a specific format on input, then an override should be applied (at open or by STREAMPROP).
(BTW, in the original, inherited implementation of external formats there was a flag EOLVALID. I don't understand the use case for that, and it isn't fetched anywhere in our core directories. But I left it in.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
last week we decided to investigate soee different options for ANY -- find the first EOL and use that interpretation throughout.
Two things to think about: First EOL with input = ANY means you can do COPYBYTES.
Second, use EXTERNALFORMAT for EOL convention.
:FIRST-USE? Copychars vs copybytes. We're moving this to Draft.
Has any additional work been done on this? |
Nothing more has been done. I believe that the next step is to add another 2-bit field to the STREAM datatype (beyond the part that Maiko knows about) to hold the actual EOL convention that is detected when the file is read as ANY. This is so that COPYCHARS can preserve the original EOL convention of the characters, and even be consistent if the EOL convention changes across the file. |
BTW, there is a long related discussion at issue #345 |
As per the technical meeting on 7/15/2024.
This sets the default EOL convention for input files to be ANY.
It also extends the possibilities for the externalformat parameter to OPENSTREAM. It can be a known format atom (e.g. :UTF-8) as before. But it can also be an EOL convention (CR, LF, CRLF, ANY) or a (format eolconvention) pair (e.g. (:XCCS LF)).
The motivation for this extension is to sneak in the EOL convention in the :EXTERNAL-FORMAT optional argument to CL:OPEN. The Commonlisp spec doesn't allow for arbitrary opening parameters to be specified, we trick it at least for the EOL convention by overloading the external format argument (essentially treating the EOL as a funky external format).