Fix bug #948: make HIGH/LOW-VALUE sensitive to program collating sequence #136

ddeclerck · 2024-02-26T17:15:51Z

This attempts to fix https://sourceforge.net/p/gnucobol/bugs/948/.

I the loading of collating tables has been moved from codegen to typeck. On loading the program collating sequence, we compute the low and high values and store them in global variables. Codegen now use these variables instead of hard-coded 0/255/0xff. I've also checked the rest of the code to ensure I wasn't breaking anything (I paid attention to the distinction between cb_low/cb_high and cb_norm_low/cb_norm_high).

I also replaced the hard-coded cob_refer_ascii and cob_refer_ebcdic tables by the loaded tables (since their content matches the one defined in default.ttbl).

Added a few tests, in particular the one that was failing for our client.

ddeclerck · 2024-03-11T20:32:57Z

Hi @GitMensch,
Would you have a bit of time to review this PR ?
This is a blocking issue for our client.
Thanks

GitMensch · 2024-03-11T20:34:53Z

Doing so now (I had quite a lot on my desk [don't ask how it currently looks like ;-) - but as my evening is free I'm inspecting this now, then go on to the other non-reviewed ones.

ddeclerck · 2024-03-11T20:46:16Z

I had quite a lot on my desk [don't ask how it currently looks like ;-)

Actually curious: how does it look like ? :D

GitMensch

The general approach looks fine to me.
Please add the two missing test and possibly adjust the code.

... not the big point of this PR, but we should have at least one test for high-value and low-value of a PIC N (which is UTF-16). Please ensure that it at least does not get worse as it may is.

tests/testsuite.src/run_misc.at

cobc/typeck.c

GitMensch · 2024-03-11T21:37:33Z

Inspecting strings.c str_cob_low -> that always uses 0x00; if we put the low- and high-value into the cob_program, then we could make str_cob_low.data point to that.

Thoughts?

ddeclerck · 2024-03-12T15:01:32Z

Inspecting strings.c str_cob_low -> that always uses 0x00; if we put the low- and high-value into the cob_program, then we could make str_cob_low.data point to that.

Thoughts?

Yeah, that's something I should do.

ddeclerck · 2024-03-12T16:10:54Z

Inspecting strings.c str_cob_low -> that always uses 0x00; if we put the low- and high-value into the cob_program, then we could make str_cob_low.data point to that.
Thoughts?

Yeah, that's something I should do.

Actually not sure how to do that.

I've moved low_value and high_value from typeck.c to the cb_program strucure in tree.h.
codegen.c uses that information to output the cob_all_low and cob_all_high fields to the local include files (before that PR, that was in the global include file).

Now I don't know what to do with str_cob_low. Right now it is a static variable. I guess it would require making it non-static, and make it depend on the current program...

Or should I just put the cob_all_low and cob_all_high fields in cob_module instead, so I can use them through COB_MODULE_PTR in strings.c ?

GitMensch · 2024-03-12T17:12:55Z

Or should I just put the cob_all_low and cob_all_high fields in cob_module instead, so I can use them through COB_MODULE_PTR in strings.c ?

You could do that, using the old value if those aren't set. Note that you don't need the the fields referenced there, it would be enough to have a pointer to a string there containing its .data (which would be a string constant in the generated code.
If the pointers are NULL you'd use the old values (always "\x00" / "\xFF", I guess).

Now I don't know what to do with str_cob_low. Right now it is a static variable. I guess it would require making it non-static, and make it depend on the current program...

For making it work it would be enough to set its .data according to the current program. For making it good, you'd place that locally in one of the callers, (where you then also setting the .data) and pass the reference around as/if needed.

codecov-commenter · 2024-03-13T09:37:53Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 91.22807% with 5 lines in your changes missing coverage. Please review.

Project coverage is 67.60%. Comparing base (16e84a5) to head (768ad62).
Report is 35 commits behind head on gcos4gnucobol-3.x.

Files with missing lines	Patch %	Lines
cobc/codegen.c	89.47%	2 Missing ⚠️
libcob/strings.c	50.00%	1 Missing and 1 partial ⚠️
cobc/typeck.c	96.42%	0 Missing and 1 partial ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@                  Coverage Diff                  @@
##           gcos4gnucobol-3.x     #136      +/-   ##
=====================================================
- Coverage              67.85%   67.60%   -0.26%     
=====================================================
  Files                     33       33              
  Lines                  60458    60795     +337     
  Branches               15821    15882      +61     
=====================================================
+ Hits                   41026    41102      +76     
- Misses                 13567    13826     +259     
- Partials                5865     5867       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ddeclerck · 2024-03-13T09:54:22Z

I think that's it ?

GitMensch · 2024-03-13T09:59:13Z

Just from reading the Changelog: we now have a low-value field in the cob_module and cb_program; shouldn't we have the collating sequence in there already (or if not: add it)? If so, couldn't we just use its first (or last) position?

ddeclerck · 2024-03-13T10:04:55Z

Just from reading the Changelog: we now have a low-value field in the module; shouldn't we have the collating sequence in there already (or in general)?If so, couldn't we just use its first position?

True indeed, I oversighted this.
We could directly use module->collating_sequence[0] - when collating_sequence is not NULL.
I'll make the change.

GitMensch · 2024-03-13T10:06:35Z

Thanks - before doing that, what about the compiler (see the edited comment above)?

ddeclerck · 2024-03-13T10:28:13Z

Just from reading the Changelog: we now have a low-value field in the cob_module and cb_program; shouldn't we have the collating sequence in there already (or if not: add it)? If so, couldn't we just use its first (or last) position?

Yes, cob_module already contains the collating sequence (as an array of characters).
I made the change to use that in strings.c.

As for cb_program, the collating sequence field is a cb_tree, and it is slightly more involved to determine the low and high values (see cb_validate_collating in typeck.c), so I believe it is a good idea to cache the result of this computation in the cb_program fields low_value and high_value. What do you think ?

GitMensch · 2024-03-13T10:35:32Z

Just a note: using MF VC the tests all fail, no matter what you define or what you have in CHARSET directive or in the MFCODESET variable at runtime: low-value stays 1, high-value stays 256.

But it does always return the expected X < HIGH-VALUE.

GitMensch · 2024-03-13T10:36:56Z

What do you think ?

I think you're right, having it one-time resolved (after the OBJECT-COMPUTER paragraph was parsed) and stored to cb_program makes sense.

ddeclerck · 2024-03-13T10:42:06Z

Just a note: using MF VC the tests all fail, no matter what you define or what you have in CHARSET directive or in the MFCODESET variable at runtime: low-value stays 1, high-value stays 256.

Ah, this is interesting.
So MF always use 1 and 256 for low and high value ?
Should we add a dialect option to reproduce this behavior in GnuCOBOL ?

GitMensch · 2024-03-13T10:51:24Z

Something ... is going on there, I need to do more checks on MF (my latest test were confusing) but I need to do something else before going on with that.

ddeclerck · 2024-03-15T16:37:39Z

Something ... is going on there, I need to do more checks on MF (my latest test were confusing) but I need to do something else before going on with that.

Hi @GitMensch,
Do you have any news about this matter ?

ddeclerck · 2024-04-04T18:32:32Z

Something ... is going on there, I need to do more checks on MF (my latest test were confusing) but I need to do something else before going on with that.

Hi @GitMensch ,
Did you have a chance to investigate this further ?

ddeclerck · 2024-05-14T13:22:43Z

Got my hands on a MF demo. Interrestingly, using ORD on LOW-VALUE and HIGH-VALUE always give the same result (1 and 256), no matter the encoding. However, exploring the internal representation of fields (using REDEFINES with USAGE BINARY-CHAR) after moving LOW-VALUE and HIGH-VALUE do show the expected values. Maybe it's just ORD behaving differently on MF ?

ddeclerck · 2024-07-23T14:05:22Z

Hi @GitMensch ,
Is there anything we can do about that ?
(we keep being asked about this patch)

ddeclerck · 2024-08-20T14:50:20Z

Hi @GitMensch,
This PR requires some love ❤️
It's getting harder to rebase each time, as my memory of these matters fades away 😅

nberth · 2024-08-23T08:37:12Z

tests/testsuite.src/run_misc.at

+AT_CHECK([$COMPILE -febcdic-table=ebcdic500_latin1 -fdefault-colseq=EBCDIC prog.cob], [0], [], [])
+AT_CHECK([$COBCRUN_DIRECT ./prog], [0], [], [])
+
+AT_CLEANUP


@GitMensch This origins from a coding pattern we've witnessed in a code base. Without the changes in this PR the search fails. Even if that requires a rather unusual collating sequence that leads to HIGH-VALUE not being 0xff, I think that's one more case for considering an upstream of this PR.

Even if that requires a rather unusual collating sequence

Not so unsual IMHO (EBCDIC-500 / Latin-1).

the information from those two discussion elements should be in either a COBOL or m4 comment within the testcase

GitMensch

I'd need another look at this, but should be able to do so soon.
Apart from the rebase this PR should get to fix the CI, I made some code comments already.

Note that from IBM docs: HIGH-VALUE is in EBCDIC always x'FF', in NATIONAL always nx'FFFF', IBM notes that programmer's should take care that this values are not explicit/implicit converted because then they are replaced by a supplement character (not sure if this is relevant for the SEARCH..., just wanted top drop it)

cobc/typeck.c

cobc/tree.c

cobc/typeck.c

tests/testsuite.src/run_misc.at

ddeclerck · 2024-11-03T20:34:51Z

Note that from IBM docs: HIGH-VALUE is in EBCDIC always x'FF', in NATIONAL always nx'FFFF'

I'd really like to interpret that as "HIGH-VALUE in native encoding is always x'FF'. Which makes it EBCDIC on mainframes, but ASCII on PCs (more or less).

ddeclerck · 2024-12-20T09:09:37Z

Rebased - SVN r5406.

ddeclerck · 2024-12-30T10:14:07Z

Rebased - SVN r5408.

…ollating sequence

ddeclerck · 2025-01-27T09:19:27Z

Rebased - SVN r5427.

GitMensch

requesting a rebase and relative minor changes, but we should be ready to roll soon

note: currently missing in this PR is a note in gnucobol.info where -febcdic-table is defined, to explain that using this can change HIGH-VALUE if the ALPHABET used for the PROGRAM COLLATION is non-native; as well as a change note in NEWS that speaks about any non-native PROGRAM COLLATION (self-defined or adjusted to non-native) which ma reference that entry as well.

(and when this is merged we possibly point Vince/Eugenio to that, so that the programmer's guide may include a note on HIGH-/LOW-VALUE referencing the PROGRAM COLLATION)

... and, of course, we did not tackle high-value/low-value when used with unicode items...

GitMensch · 2025-03-04T15:43:18Z

cobc/codegen.c

+	/* Low and high values */
+	if (gen_figurative) {
+		output_low_value ();
+		output_high_value ();
+	}


I think we can drop the comment and condition here.

GitMensch · 2025-03-04T16:05:39Z

cobc/ChangeLog

+	  the cob_all_low and cob_all_high fields from global to local;
+	  adjust the output_collating_tables function to use the tables and
+	  functions defined in typeck.c; set the new module field low_value
+


scanner.l is missing here

GitMensch · 2025-03-04T16:08:26Z

cobc/typeck.c

+                prog->high_value = ascii_to_ebcdic[0xff];
+#else
+	if (CB_ALPHABET_NAME (x)->alphabet_type == CB_ALPHABET_EBCDIC) {
+                prog->low_value = ebcdic_to_ascii[0x00];
+                prog->high_value = ebcdic_to_ascii[0xff];
+#endif
+	} else if (CB_ALPHABET_NAME (x)->alphabet_type == CB_ALPHABET_CUSTOM) {


that's a bit confusing, please include the closing }within the matching #if/#else (= two times), not outside

... thinking further - as the customer alphabet has an early return, please move its handling before the ifdef - as it was before

GitMensch · 2025-03-04T16:11:45Z

cobc/typeck.c

Please add 2025 to the (C) year and if you had any other than the "style" changes mentioned below done after the initial implementation, add a "today" Changelog entry for those.

GitMensch · 2025-03-04T16:15:32Z

cobc/tree.c

+	p->low_value = 0;
+	p->high_value = 255;
+	p->low_value_n = 0;
+	p->high_value_n = 65535;


suggestion (not request): make those hex literals

GitMensch · 2025-03-04T16:57:41Z

tests/testsuite.src/run_misc.at

@@ -15251,3 +15289,358 @@ AT_CHECK([$COMPILE prog.cob])
 AT_CHECK([$COBCRUN_DIRECT ./prog], [0], [1.2345E-5 1.2345E-5], [])

 AT_CLEANUP
+
+
+# See bug #948 - Comparison with HIGH-VALUE in presence of collating sequences


this m4 comment should come after AT_KEYWORDS (with surrounding empty line)

GitMensch · 2025-03-04T16:59:25Z

tests/testsuite.src/run_misc.at

+[LOW-VALUE: 063 HIGH-VALUE: 192
+LOW-VALUE OK
+HIGH-VALUE OK
+X > LOW-VALUE
+X < HIGH-VALUE
+], [])


I tend to have the low+ + high-value output as DISPLAY ... WITH NO ADVANCING and the others only output to stderr, if the y are not as expected, same for the occurrences below

GitMensch · 2025-03-04T17:05:30Z

tests/testsuite.src/run_misc.at

+           ALPHABET alpha-custom IS
+               65.


Please add, possibly in a separate testcase, an example with ALPHABET ... IS HIGH-VALUE QUOTE SPACE LOW-VALUE - this should work and use the "native" values - so always x'00' and x'FF'.

GitMensch · 2025-03-04T17:09:15Z

tests/testsuite.src/run_misc.at

+           INITIALIZE ENDXS.
+           PERFORM VARYING IND FROM 1 BY 1 UNTIL ENDXS = HIGH-VALUE
+              DISPLAY IND " " WITH NO ADVANCING
+              IF IND = 9
+                 MOVE HIGH-VALUE TO ENDXS
+              END-IF
+           END-PERFORM.


as I've missed that - please add a COBOL comment why we do this second loop

GitMensch · 2025-03-04T17:10:17Z

tests/testsuite.src/run_misc.at

+AT_CHECK([$COBCRUN_DIRECT ./prog6], [0],
+[01 02 03 04 05 06 07 08 09 01 02 03 04 05 06 07 08 09 ], [])
+
+AT_CLEANUP


Should there be a program that checks ORD(HIGH-VALUE) here as well?

ddeclerck force-pushed the fix_high_value branch from 7693151 to 1ef55fb Compare February 26, 2024 17:17

ddeclerck requested a review from GitMensch February 26, 2024 17:17

ddeclerck requested review from GitMensch and removed request for GitMensch March 5, 2024 16:25

GitMensch requested changes Mar 11, 2024

View reviewed changes

tests/testsuite.src/run_misc.at Show resolved Hide resolved

tests/testsuite.src/run_misc.at Show resolved Hide resolved

cobc/typeck.c Outdated Show resolved Hide resolved

cobc/typeck.c Show resolved Hide resolved

GitMensch mentioned this pull request Mar 11, 2024

Move static variables to local state structures in libcob/strings.c #137

Closed

ddeclerck force-pushed the fix_high_value branch from 1ef55fb to dffe116 Compare March 12, 2024 16:03

ddeclerck force-pushed the fix_high_value branch from dffe116 to cf8580b Compare March 13, 2024 09:33

ddeclerck force-pushed the fix_high_value branch from cf8580b to 8e23912 Compare March 13, 2024 09:44

ddeclerck force-pushed the fix_high_value branch from 8e23912 to 60568a2 Compare March 13, 2024 10:32

ddeclerck force-pushed the fix_high_value branch from 60568a2 to 2de72ad Compare April 2, 2024 14:26

ddeclerck added the High priority label Apr 9, 2024

ddeclerck mentioned this pull request May 14, 2024

Fix MSVC builds - missing profiling.c #144

Closed

ddeclerck force-pushed the fix_high_value branch from 2de72ad to e9515d8 Compare May 29, 2024 21:42

ddeclerck force-pushed the fix_high_value branch from e9515d8 to 7c37658 Compare June 14, 2024 12:19

ddeclerck force-pushed the fix_high_value branch from 7c37658 to c48603e Compare August 20, 2024 14:07

nberth reviewed Aug 23, 2024

View reviewed changes

ddeclerck mentioned this pull request Aug 25, 2024

Fixes for bug reported under MSVC Debug #175

Closed

ddeclerck force-pushed the fix_high_value branch from ff7761b to e88347b Compare October 2, 2024 15:07

GitMensch requested changes Nov 3, 2024

View reviewed changes

ddeclerck force-pushed the fix_high_value branch 2 times, most recently from 2b53538 to 06e91c9 Compare November 6, 2024 08:47

ddeclerck force-pushed the fix_high_value branch from 06e91c9 to 03e5d23 Compare December 20, 2024 09:08

ddeclerck force-pushed the fix_high_value branch from 03e5d23 to 2bd9740 Compare December 20, 2024 16:39

ddeclerck force-pushed the fix_high_value branch from 2bd9740 to def7796 Compare December 30, 2024 10:13

ddeclerck force-pushed the fix_high_value branch 2 times, most recently from 057d3e5 to c0347d7 Compare January 6, 2025 17:15

ddeclerck and others added 3 commits January 27, 2025 10:17

Fix bug #948: make HIGH/LOW-VALUE sensitive to ASCII/EBCDIC program c…

80957d8

…ollating sequence

Add a SEARCH ALL test with non-0xff HIGH-VALUE

4d96940

Improvements

768ad62

ddeclerck force-pushed the fix_high_value branch from c0347d7 to 768ad62 Compare January 27, 2025 09:18

GitMensch requested changes Mar 4, 2025

View reviewed changes

Fix bug #948: make HIGH/LOW-VALUE sensitive to program collating sequence #136

Are you sure you want to change the base?

Fix bug #948: make HIGH/LOW-VALUE sensitive to program collating sequence #136

Conversation

ddeclerck commented Feb 26, 2024 • edited Loading

ddeclerck commented Mar 11, 2024

GitMensch commented Mar 11, 2024

ddeclerck commented Mar 11, 2024

GitMensch left a comment

Choose a reason for hiding this comment

GitMensch commented Mar 11, 2024

ddeclerck commented Mar 12, 2024

ddeclerck commented Mar 12, 2024

GitMensch commented Mar 12, 2024

codecov-commenter commented Mar 13, 2024 • edited Loading

Codecov Report

ddeclerck commented Mar 13, 2024

GitMensch commented Mar 13, 2024 • edited Loading

ddeclerck commented Mar 13, 2024

GitMensch commented Mar 13, 2024

ddeclerck commented Mar 13, 2024

GitMensch commented Mar 13, 2024 • edited Loading

GitMensch commented Mar 13, 2024

ddeclerck commented Mar 13, 2024 • edited Loading

GitMensch commented Mar 13, 2024

ddeclerck commented Mar 15, 2024

ddeclerck commented Apr 4, 2024

ddeclerck commented May 14, 2024 • edited Loading

ddeclerck commented Jul 23, 2024

ddeclerck commented Aug 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GitMensch left a comment

Choose a reason for hiding this comment

ddeclerck commented Nov 3, 2024

ddeclerck commented Dec 20, 2024

ddeclerck commented Dec 30, 2024

ddeclerck commented Jan 27, 2025

GitMensch left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ddeclerck commented Feb 26, 2024 •

edited

Loading

codecov-commenter commented Mar 13, 2024 •

edited

Loading

GitMensch commented Mar 13, 2024 •

edited

Loading

GitMensch commented Mar 13, 2024 •

edited

Loading

ddeclerck commented Mar 13, 2024 •

edited

Loading

ddeclerck commented May 14, 2024 •

edited

Loading

GitMensch left a comment •

edited

Loading