-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug #948: make HIGH/LOW-VALUE sensitive to program collating sequence #136
base: gcos4gnucobol-3.x
Are you sure you want to change the base?
Conversation
7693151
to
1ef55fb
Compare
Hi @GitMensch, |
Doing so now (I had quite a lot on my desk [don't ask how it currently looks like ;-) - but as my evening is free I'm inspecting this now, then go on to the other non-reviewed ones. |
Actually curious: how does it look like ? :D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general approach looks fine to me.
Please add the two missing test and possibly adjust the code.
... not the big point of this PR, but we should have at least one test for high-value and low-value of a PIC N
(which is UTF-16). Please ensure that it at least does not get worse as it may is.
Inspecting strings.c Thoughts? |
Yeah, that's something I should do. |
1ef55fb
to
dffe116
Compare
Actually not sure how to do that. I've moved Now I don't know what to do with Or should I just put the |
You could do that, using the old value if those aren't set. Note that you don't need the the fields referenced there, it would be enough to have a pointer to a string there containing its
For making it work it would be enough to set its |
dffe116
to
cf8580b
Compare
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## gcos4gnucobol-3.x #136 +/- ##
=====================================================
- Coverage 67.85% 67.60% -0.26%
=====================================================
Files 33 33
Lines 60458 60795 +337
Branches 15821 15882 +61
=====================================================
+ Hits 41026 41102 +76
- Misses 13567 13826 +259
- Partials 5865 5867 +2 ☔ View full report in Codecov by Sentry. |
cf8580b
to
8e23912
Compare
I think that's it ? |
Just from reading the Changelog: we now have a low-value field in the |
True indeed, I oversighted this. |
Thanks - before doing that, what about the compiler (see the edited comment above)? |
Yes, As for |
8e23912
to
60568a2
Compare
Just a note: using MF VC the tests all fail, no matter what you define or what you have in CHARSET directive or in the MFCODESET variable at runtime: low-value stays 1, high-value stays 256. But it does always return the expected |
I think you're right, having it one-time resolved (after the |
Ah, this is interesting. |
Something ... is going on there, I need to do more checks on MF (my latest test were confusing) but I need to do something else before going on with that. |
Hi @GitMensch, |
60568a2
to
2de72ad
Compare
Hi @GitMensch , |
Got my hands on a MF demo. Interrestingly, using |
e9515d8
to
7c37658
Compare
Hi @GitMensch , |
7c37658
to
c48603e
Compare
Hi @GitMensch, |
AT_CHECK([$COMPILE -febcdic-table=ebcdic500_latin1 -fdefault-colseq=EBCDIC prog.cob], [0], [], []) | ||
AT_CHECK([$COBCRUN_DIRECT ./prog], [0], [], []) | ||
|
||
AT_CLEANUP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GitMensch This origins from a coding pattern we've witnessed in a code base. Without the changes in this PR the search fails. Even if that requires a rather unusual collating sequence that leads to HIGH-VALUE
not being 0xff
, I think that's one more case for considering an upstream of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if that requires a rather unusual collating sequence
Not so unsual IMHO (EBCDIC-500 / Latin-1).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the information from those two discussion elements should be in either a COBOL or m4 comment within the testcase
ff7761b
to
e88347b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd need another look at this, but should be able to do so soon.
Apart from the rebase this PR should get to fix the CI, I made some code comments already.
Note that from IBM docs: HIGH-VALUE is in EBCDIC always x'FF', in NATIONAL always nx'FFFF', IBM notes that programmer's should take care that this values are not explicit/implicit converted because then they are replaced by a supplement character (not sure if this is relevant for the SEARCH..., just wanted top drop it)
I'd really like to interpret that as "HIGH-VALUE in native encoding is always x'FF'. Which makes it EBCDIC on mainframes, but ASCII on PCs (more or less). |
2b53538
to
06e91c9
Compare
06e91c9
to
03e5d23
Compare
Rebased - SVN r5406. |
03e5d23
to
2bd9740
Compare
2bd9740
to
def7796
Compare
Rebased - SVN r5408. |
057d3e5
to
c0347d7
Compare
c0347d7
to
768ad62
Compare
Rebased - SVN r5427. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requesting a rebase and relative minor changes, but we should be ready to roll soon
note: currently missing in this PR is a note in gnucobol.info where -febcdic-table is defined, to explain that using this can change HIGH-VALUE
if the ALPHABET used for the PROGRAM COLLATION is non-native; as well as a change note in NEWS that speaks about any non-native PROGRAM COLLATION (self-defined or adjusted to non-native) which ma reference that entry as well.
(and when this is merged we possibly point Vince/Eugenio to that, so that the programmer's guide may include a note on HIGH-/LOW-VALUE referencing the PROGRAM COLLATION)
... and, of course, we did not tackle high-value/low-value when used with unicode items...
/* Low and high values */ | ||
if (gen_figurative) { | ||
output_low_value (); | ||
output_high_value (); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can drop the comment and condition here.
the cob_all_low and cob_all_high fields from global to local; | ||
adjust the output_collating_tables function to use the tables and | ||
functions defined in typeck.c; set the new module field low_value | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scanner.l is missing here
prog->high_value = ascii_to_ebcdic[0xff]; | ||
#else | ||
if (CB_ALPHABET_NAME (x)->alphabet_type == CB_ALPHABET_EBCDIC) { | ||
prog->low_value = ebcdic_to_ascii[0x00]; | ||
prog->high_value = ebcdic_to_ascii[0xff]; | ||
#endif | ||
} else if (CB_ALPHABET_NAME (x)->alphabet_type == CB_ALPHABET_CUSTOM) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a bit confusing, please include the closing }
within the matching #if/#else (= two times), not outside
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... thinking further - as the customer alphabet has an early return, please move its handling before the ifdef - as it was before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add 2025 to the (C) year and if you had any other than the "style" changes mentioned below done after the initial implementation, add a "today" Changelog entry for those.
p->low_value = 0; | ||
p->high_value = 255; | ||
p->low_value_n = 0; | ||
p->high_value_n = 65535; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (not request): make those hex literals
@@ -15251,3 +15289,358 @@ AT_CHECK([$COMPILE prog.cob]) | |||
AT_CHECK([$COBCRUN_DIRECT ./prog], [0], [1.2345E-5 1.2345E-5], []) | |||
|
|||
AT_CLEANUP | |||
|
|||
|
|||
# See bug #948 - Comparison with HIGH-VALUE in presence of collating sequences |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this m4 comment should come after AT_KEYWORDS (with surrounding empty line)
[LOW-VALUE: 063 HIGH-VALUE: 192 | ||
LOW-VALUE OK | ||
HIGH-VALUE OK | ||
X > LOW-VALUE | ||
X < HIGH-VALUE | ||
], []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to have the low+ + high-value output as DISPLAY ... WITH NO ADVANCING
and the others only output to stderr, if the y are not as expected, same for the occurrences below
ALPHABET alpha-custom IS | ||
65. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add, possibly in a separate testcase, an example with ALPHABET ... IS HIGH-VALUE QUOTE SPACE LOW-VALUE
- this should work and use the "native" values - so always x'00' and x'FF'.
INITIALIZE ENDXS. | ||
PERFORM VARYING IND FROM 1 BY 1 UNTIL ENDXS = HIGH-VALUE | ||
DISPLAY IND " " WITH NO ADVANCING | ||
IF IND = 9 | ||
MOVE HIGH-VALUE TO ENDXS | ||
END-IF | ||
END-PERFORM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as I've missed that - please add a COBOL comment why we do this second loop
AT_CHECK([$COBCRUN_DIRECT ./prog6], [0], | ||
[01 02 03 04 05 06 07 08 09 01 02 03 04 05 06 07 08 09 ], []) | ||
|
||
AT_CLEANUP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be a program that checks ORD(HIGH-VALUE)
here as well?
This attempts to fix https://sourceforge.net/p/gnucobol/bugs/948/.
I the loading of collating tables has been moved from
codegen
totypeck
. On loading the program collating sequence, we compute the low and high values and store them in global variables. Codegen now use these variables instead of hard-coded 0/255/0xff. I've also checked the rest of the code to ensure I wasn't breaking anything (I paid attention to the distinction betweencb_low
/cb_high
andcb_norm_low
/cb_norm_high
).I also replaced the hard-coded
cob_refer_ascii
andcob_refer_ebcdic
tables by the loaded tables (since their content matches the one defined indefault.ttbl
).Added a few tests, in particular the one that was failing for our client.