Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add warnings for bare CR in line endings #8

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Add warnings for bare CR in line endings #8

wants to merge 2 commits into from

Conversation

jblang
Copy link

@jblang jblang commented Sep 3, 2017

I've added a warning to macro1.c for bare CR line endings if you think it's worth committing.

@markpizz
Copy link
Member

markpizz commented Sep 3, 2017

Does this emit a single message for each input file that is processed?

@jblang
Copy link
Author

jblang commented Sep 3, 2017

Currently it emits multiple messages: one for each string returned by fgets that contains a CR without an accompanying LF. I can modify it to do that if you prefer.

@jblang
Copy link
Author

jblang commented Sep 3, 2017

Since fgets is reading a maximum of 96 bytes per invocation, that amounts to a message every 96 bytes, assuming those 96 bytes contain \r without an accompanying \n. This may seem like spam, but the volume of messages this emits is dwarfed by the other messages that the assembler emits because it doesn't have any line breaks.

@markpizz
Copy link
Member

markpizz commented Sep 3, 2017

I suggested performing this test at file open time, so a single useful message could be emitted that contained the file name that is has the problematic line endings. Since that input is completely garbage, it would be reasonable to exit the program after emitting the single message about that file. The useful information will be displayed and the user can make the appropriate correction without having to dig through piles of noise to find the problem.

@jblang
Copy link
Author

jblang commented Sep 4, 2017

OK, i did another commit that will skip the rest of the file and log an error if it encounters a bare CR: https://github.com/jblang/simtools/blob/master/crossassemblers/macro1/macro1.c#L1879. The logic in these if statements is a bit tortured, but I couldn't figure out another way to do it without changing the entire control flow of the program. The check for pass == 1 is necessary because otherwise the message gets printed twice.

One other thing I noticed is that the nextfiodec function appears to actually be expecting bare CRs inside a fiodec string: https://github.com/jblang/simtools/blob/master/crossassemblers/macro1/macro1.c#L1519. I looked at the PDP-1 handbook and fiodec only has CR; there is no LF, so maybe it's valid in that case? So I am not sure if aborting on a bare CR would break some previously working source code or not. It's your call if you think it's too risky.

@markpizz
Copy link
Member

markpizz commented Sep 4, 2017

Well, the reference to the bare CR in the nextfiodec routine will actually result in an infinite loop if a bare CR is ever encountered there, so I doubt this test breaks anything. Meanwhile, checking every line for the presence of a bare CR will certainly limit any case where that might ever be legitimate input.

My suggestion was:

@@ -1335,6 +1335,8 @@ processLine() {
 /*  Synopsis:  Do one assembly pass. */
 void onePass() {
     int     ix;
     char    testline[LINELEN];
+    char    *ch;
 
     clc = 4;				/* Default location is 4 */
     start_addr = 0;			/* No starting address. */
@@ -1373,6 +1375,13 @@ void onePass() {
 	exit( -1 );
     }
 
+    /* Perform basic test to make sure this file has reasonable line endings */
+    if (fgets( testline, LINELEN - 1, infile ) != NULL &&
+        ( strlen( testline ) == LINELEN - 2 && (ch = strchr( testline, '\r' )) != NULL) && ch[1] != '\n') {
+            fprintf( stderr, "%s: unexpected line endings (bare CR) found in %s; fix with:\n\n"
+                                   "    $ dos2unix --convmode mac %s\n", save_argv[0], save_argv[filix_curr], save_argv[filix_curr]);
+	exit( -1 );
+    }
+    rewind(infile);
     for (;;) {
 	readLine();
 	if (end_of_input) {

@jblang
Copy link
Author

jblang commented Sep 5, 2017

So that just does a check on the first 96 characters? That would have caught it, at least in all the files I tried.

@markpizz
Copy link
Member

markpizz commented Sep 5, 2017

The line buffer is the maximum line size the program will support. If the first “line” of the file doesn’t have a reasonable line ending, we can hardly expect the input file to have meaningful content. If a user somehow provides an input file with some screwy combination of input data that mixes line endings (with some that only have CR) then they get the results of “garbage-in” => “garbage-out”. The original code already tolerates either LF or CRLF line endings. If the legal syntax of the input actually allows for imbedded CR within some part of the text of a "line", the use of this would hardly happen on the first line of the file. fgets() would read up to 94 characters until it finds a LF, so any number of CR characters may be contained in the line anywhere prior to the LF. A single CR immediately prior the ending LF will be squished out by readLine(), but any others will merely be data in the line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants