Skip to content

Commit

Permalink
improved performance for comparisons between numeric DISPLAY, numeric…
Browse files Browse the repository at this point in the history
… DISPLAY to literal, as well as BCD + ZERO and to other (and BCD zero)

co-authored by @chaat

libcob:
* numeric.c (cob_numdisp_cmp): new version cob_numeric_display_cmp that does NOT do any conversion to integer formats, also it does not use any temporary buffers to do the compare which should improve its performance
* numeric.c (cob_numeric_display_cmp_zero, cob_bcd_cmp_zero), common.h: new functions for optimized check of common compare
* numeric.c (cob_set_packed_u64): restore cleanup code for padding byte, but only execute it if we wrote to first data position

cobc/typeck.c:
* (cb_build_optim_cond): generate calls to new functions for numeric zero compare for USAGE DISPLAY and PACKED, as well as comparison between two USAGE DISPLAY or USAGE DISPLAY to literal
* (validate_move): internally change literals like 0.00 to zero constant allowing use of optimized MOVE code
sf-mensch committed Jan 28, 2024
1 parent 12e31f9 commit 47ffbd8
Showing 6 changed files with 402 additions and 67 deletions.
97 changes: 51 additions & 46 deletions NEWS
Original file line number Diff line number Diff line change
@@ -15,6 +15,9 @@ NEWS - user visible changes -*- outline -*-
* Important Bugfixes

** #904: MOVE PACKED-DECIMAL unsigned to signed led to bad sign
** Padding bytes of BCD may store a non-truncated digit; while
this has no effect on calculations it can create problems on
later binary comparison of the field as well as on group MOVEs
** #918: COB_LS_VALIDATE (io status 09 and 71) partial broken

* Changes to the COBOL compiler (cobc) options:
@@ -31,9 +34,57 @@ NEWS - user visible changes -*- outline -*-
* More notable changes

** execution times were significantly reduced for the following:
comparison between a numeric DISPLAY variable to another or to a literal
comparison between numeric DISPLAY or BCD variable to zero
INSPECT CONVERTING (and "simple" INSPECT REPLACING), in general
and especially if both from and to are constants

* Known issues in 3.x

** testsuite:
* if built with vbisam, cisam or disam, depending on the version used, some
tests will lead to UNEXPECTED PASS, while others may fail
* possibly failing tests (false positives):
* temporary path invalid
* compiler outputs (assembler)
* compile from stdin
* NIST: OBNC1M.CBL false positive (the test runner uses a nonportable way of
emulating a program kill)
* if build with -fsanitize, then some tests will fail; while we accept patches
to improve that, we don't consider the failing tests as bug in GnuCOBOL

** the recent additions of ">> TURN" and "variable LIKE variable" may not work
as expected in all cases

** floating-point comparison for equality may return unexpected results as it
involves a necessary tolerance; you may adjust the default tolerance of
0.0000001 by compiling GnuCOBOL for example with
LIBCOB_CPPFLAGS="-DCOB_FLOAT_DELTA=0.0000000000001";
we seek input for a reasonable default for GnuCOBOL 4 (use the mailing list
or discussion board to share your comments on this topic, keeping in mind
that this has to take both mathematical and "C compiler portability" into
account)

** variables containing PICTURE symbol P may lead to wrong results in rare
cases (especially screenio) - please send a bug report if you catch a case;
since GC 3.2 rc3 all arithmetic operations and MOVE are believed to be
correct

** features that are known to not be portable to every environment yet
(especially when using a different compiler than GCC)
* function with variable-length RETURNING item
* USAGE POINTER, which may need to be manually aligned

** all versions of GnuCOBOL so far: EVALUATE evaluates all subjects on *each*
WHEN (the standard explicit requests a one-time evaluation of the subjects,
then comparing the value); to work around possible issues move more complex
subjects like variables with subscripts and reference-modification, as well
as calculated subjects and function calls to a temporary variable and use
this as subject for the EVALUATE

For more known issues see the bug tracker.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

GnuCOBOL 3.2 (20230728)
GnuCOBOL 3.2rc1 (20230118)
@@ -537,52 +588,6 @@ NEWS - user visible changes -*- outline -*-

** undocumented option -tsymbols, which was replaced by -ftsymbols in 3.0


* Known issues in 3.2 (and 3.1)

** testsuite:
* if built with vbisam, cisam or disam, depending on the version used, some
tests will lead to UNEXPECTED PASS, while others may fail
* possibly failing tests (false positives):
* temporary path invalid
* compiler outputs (assembler)
* compile from stdin
* NIST: OBNC1M.CBL false positive (the test runner uses a nonportable way of
emulating a program kill)
* if build with -fsanitize, then some tests will fail; while we accept patches
to improve that, we don't consider the failing tests as bug in GnuCOBOL

** the recent additions of ">> TURN" and "variable LIKE variable" may not work
as expected in all cases

** floating-point comparison for equality may return unexpected results as it
involves a necessary tolerance; you may adjust the default tolerance of
0.0000001 by compiling GnuCOBOL for example with
LIBCOB_CPPFLAGS="-DCOB_FLOAT_DELTA=0.0000000000001";
we seek input for a reasonable default for GnuCOBOL 4 (use the mailing list
or discussion board to share your comments on this topic, keeping in mind
that this has to take both mathematical and "C compiler portability" into
account)

** variables containing PICTURE symbol P may lead to wrong results in rare
cases (especially screenio) - please send a bug report if you catch a case;
since GC 3.2 rc3 all arithmetic operations and MOVE are believed to be
correct

** features that are known to not be portable to every environment yet
(especially when using a different compiler than GCC)
* function with variable-length RETURNING item
* USAGE POINTER, which may need to be manually aligned

** all versions of GnuCOBOL so far: EVALUATE evaluates all subjects on *each*
WHEN (the standard explicit requests a one-time evaluation of the subjects,
then comparing the value); to work around possible issues move more complex
subjects like variables with subscripts and reference-modification, as well
as calculated subjects and function calls to a temporary variable and use
this as subject for the EVALUATE

For more known issues see the bug tracker.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

GnuCOBOL 3.1.2 released (20201223)
8 changes: 8 additions & 0 deletions cobc/ChangeLog
Original file line number Diff line number Diff line change
@@ -5,6 +5,14 @@
do not move object files and preprocess files when they were
specified as an explicit target on the command line (-E, -c)

2023-09-20 Simon Sobisch <simonsobisch@gnu.org>

* typeck.c (cb_build_optim_cond): generate calls to new functions
for numeric zero compare for USAGE DISPLAY and PACKED, as well
as comparison between two USAGE DISPLAY or USAGE DISPLAY to literal
* typeck.c (validate_move): internally change literals like 0.00
to zero constant allowing use of optimized MOVE code

2023-10-17 David Declerck <david.declerck@ocamlpro.com>

BUG #923: generated modules init/clear unused decimal constants
85 changes: 69 additions & 16 deletions cobc/typeck.c
Original file line number Diff line number Diff line change
@@ -6701,7 +6701,7 @@ cb_build_optim_cond (struct cb_binary_op *p)
const char *s;
size_t n;
const cb_tree left = p->x;
const cb_tree right = p->y;
cb_tree right = p->y;
struct cb_field *f = CB_REF_OR_FIELD_P (left)
? CB_FIELD_PTR (left) : NULL;

@@ -6723,10 +6723,19 @@ cb_build_optim_cond (struct cb_binary_op *p)
if (!cb_fits_long_long (right)) {
return NULL;
}
/* CHECKME: can we have a non-field left and field right? */
return CB_BUILD_FUNCALL_2 ("cob_cmp_llint", left,
cb_build_cast_llint (right));
}

/* test for numeric zero literal */
if (CB_LITERAL_P (right)) {
struct cb_literal *l = CB_LITERAL (right);
if (memcmp (l->data, COB_ZEROES_ALPHABETIC, l->size) == 0) {
right = cb_zero;
}
}

#if 0 /* TODO: if the right side is a literal: then build an ideal
memcmp as if it was a field of same attributes as left-side,
with the value of the literal */
@@ -6740,6 +6749,39 @@ cb_build_optim_cond (struct cb_binary_op *p)
}
}
#endif

/* if the field is DISPLAY and the right side either a literal, a constant (ZERO)
or also a DISPLAY field, then no need to convert the field(s) to an integer */
if (f->usage == CB_USAGE_DISPLAY) {
if (CB_REF_OR_FIELD_P (right)) {
if (CB_FIELD_PTR (right)->usage == CB_USAGE_DISPLAY) {
return CB_BUILD_FUNCALL_2 ("cob_numeric_display_cmp", left, right);
}
} else
if (CB_LITERAL_P (right)) {
if (f->pic->scale
|| f->pic->digits >= 19
|| ( CB_LITERAL_P (right)
&& ( CB_LITERAL (right)->scale
|| CB_LITERAL (right)->size > 19))) {
return CB_BUILD_FUNCALL_2 ("cob_numeric_display_cmp", left, right);
}
} else
if (right == cb_zero) {
if (!f->flag_sign_separate
&& !f->flag_any_numeric
&& !cb_ebcdic_sign) {
return CB_BUILD_FUNCALL_1 ("cob_numeric_display_cmp_zero", left);
}
/* for simple fields an integer-comparision is fast and inlined, so
we only use the DISPLAY compare if it is "complex" */
if (f->pic->scale
|| f->pic->digits >= 19) {
return CB_BUILD_FUNCALL_2 ("cob_numeric_display_cmp", left, right);
}
}
}

if (f->usage == CB_USAGE_PACKED
|| f->usage == CB_USAGE_COMP_6) {
if (CB_REF_OR_FIELD_P (right)) {
@@ -6757,6 +6799,9 @@ cb_build_optim_cond (struct cb_binary_op *p)
}
}
}
} else
if (right == cb_zero) {
return CB_BUILD_FUNCALL_1 ("cob_bcd_cmp_zero", left);
}
}

@@ -10210,8 +10255,8 @@ validate_inspect (cb_tree x, cb_tree y, const unsigned int replacing_or_converti
size2 = CB_LITERAL (y)->size;
break;
case CB_TAG_CONST:
/* note: in case of CONST (like SPACES or LOW-VALUES)
the original size is used in libcob */
/* note: in case of CONST (like SPACES or LOW-VALUES)
the original size is used in libcob */
/* Fall-through */
default:
size2 = 0;
@@ -11070,23 +11115,24 @@ validate_move (cb_tree src, cb_tree dst, const unsigned int is_value, int *move_
}
if (leftmost_significant == l->size) {
most_significant = -999;
least_significant = 999;
} else {
most_significant = l->size - l->scale - leftmost_significant;
if (most_significant < 1) most_significant--;
}

/* Compute the least significant figure place
in relatation to the decimal point (negative = decimal position) */
for (i = l->size - 1; i != 0; i--) {
if (l->data[i] != '0') {
break;
/* Compute the least significant figure place
in relatation to the decimal point (negative = decimal position) */
for (i = l->size - 1; i != 0; i--) {
if (l->data[i] != '0') {
break;
}
}
if (i == 0) {
least_significant = 999;
} else {
least_significant = l->size - l->scale - i;
if (least_significant < 1) least_significant--;
}
}
if (i == 0) {
least_significant = 999;
} else {
least_significant = (l->size - l->scale) - i;
if (least_significant < 1) least_significant--;
}

/* Value check */
@@ -11123,6 +11169,13 @@ validate_move (cb_tree src, cb_tree dst, const unsigned int is_value, int *move_
case CB_CATEGORY_NUMERIC:
{
const struct cb_picture *pic = fdst->pic;
if (most_significant == -999
&& l->sign == 0) {
/* replace assignments of unsigned 000.0000 to a numeric value
by optimized zero-move */
*move_zero = 1;
break;
}
if (pic->scale < 0) {
/* Check for PIC 9(n)P(m) */
if (least_significant <= -pic->scale) {
@@ -11168,7 +11221,7 @@ validate_move (cb_tree src, cb_tree dst, const unsigned int is_value, int *move_
/* Size check */
if (fdst->flag_real_binary
|| ( !cb_binary_truncate
&& fdst->pic->scale <= 0
&& fdst->pic->scale <= 0
&& ( fdst->usage == CB_USAGE_COMP_5
|| fdst->usage == CB_USAGE_COMP_X
|| fdst->usage == CB_USAGE_COMP_N
14 changes: 14 additions & 0 deletions libcob/ChangeLog
Original file line number Diff line number Diff line change
@@ -21,6 +21,20 @@
* fileio.c (lineseq_write, lineseq_rewrite): fix bug #918 partial broken
COB_LS_VALIDATE by incrementing pointer outside of macro IS_BAD_CHAR

2023-09-20 Simon Sobisch <simonsobisch@gnu.org>

* numeric.c (cob_numdisp_cmp): adjusted
* numeric.c (cob_numeric_display_cmp_zero, cob_bcd_cmp_zero), common.h:
new functions for optimized check of common compare
* numeric.c (cob_set_packed_u64): restore cleanup code for padding byte,
but only execute it if we wrote to first data position

2023-09-19 Chuck Haatvedt <chuck.haatvedt+cobol@gmail.com>

* numeric.c (cob_numdisp_cmp): new version cob_numeric_display_cmp that
does NOT do any conversion to integer formats, also it does not use any
temporary buffers to do the compare which should improve its performance

2023-09-15 Simon Sobisch <simonsobisch@gnu.org>

Fixing Bug #914 CLOSE LOCK abends program on OPEN
3 changes: 3 additions & 0 deletions libcob/common.h
Original file line number Diff line number Diff line change
@@ -1935,6 +1935,9 @@ COB_EXPIMP int cob_check_linkage_size (const char *,
/* Comparison functions */
COB_EXPIMP int cob_numeric_cmp (cob_field *, cob_field *);
COB_EXPIMP int cob_bcd_cmp (cob_field *, cob_field *);
COB_EXPIMP int cob_numeric_display_cmp (cob_field *, cob_field *);
COB_EXPIMP int cob_numeric_display_cmp_zero (cob_field *);
COB_EXPIMP int cob_bcd_cmp_zero (cob_field *);

/*******************************/
/* Functions in strings.c */
262 changes: 257 additions & 5 deletions libcob/numeric.c
Original file line number Diff line number Diff line change
@@ -1210,7 +1210,7 @@ cob_decimal_set_packed (cob_decimal *d, cob_field *f)
mpz_mul_ui (d->value, d->value, 10000UL);
}
mpz_add_ui (d->value, d->value,
( pack_to_bin[*p] * 100)
( (int)pack_to_bin[*p] * 100)
+ pack_to_bin[*(p + 1)]);
/* because of the zero-skipping we are always nonzero here */
nonzero = 1;
@@ -1398,9 +1398,9 @@ cob_set_packed_u64 (cob_field *f, const cob_u64_t val, const int sign)
*p = packed_bytes[n % 100];
}

#if 0 /* clean first half-byte;
would only be necessary if not zeroe'd out above */
{
/* clean first half-byte to handle truncation
as we may have stored 200 in S99's padding byte */
if (p <= f->data) {
const short scale = COB_FIELD_SCALE (f);
short digits;
if (scale >= 0) {
@@ -1419,7 +1419,6 @@ cob_set_packed_u64 (cob_field *f, const cob_u64_t val, const int sign)
*(f->data) &= 0x0FU;
}
}
#endif
}

/* set the specified BCD field 'f' to the given integer 'val';
@@ -4026,6 +4025,250 @@ cob_bcd_cmp (cob_field *f1, cob_field *f2)
}
#endif

#ifndef NO_DISPLAY_COMPARE
static COB_INLINE COB_A_INLINE int
compare_data (unsigned char *ptr1, unsigned char *ptr2, int len)
{
unsigned char *ptr_end = ptr1 + len;

do {
const unsigned char p1 = COB_D2I (*ptr1);
const unsigned char p2 = COB_D2I (*ptr2);
if (p1 != p2) {
if (p1 > p2) {
return 1;
}
return -1;
}
ptr1++;
ptr2++;
} while (ptr1 != ptr_end);

return 0;
}

static int
check_all_zero (unsigned char *ptr, int len)
{
unsigned char *ptr_end = ptr + len;

do {
if (COB_D2I (*ptr++) != 0) {
return 0;
}
} while (ptr != ptr_end);

return 1;
}

int
cob_numeric_display_cmp (cob_field *f1, cob_field *f2)
{
int compare_result, len;
int f1_strt_offset, f1_end_offset;
int f2_strt_offset, f2_end_offset;

/************************************************************/
/* first we need to get the scale, data size and sign of */
/* each of the two cob fields. Note that if the signs are */
/* different, then we only need to find if either of the */
/* fields is non-zero. */
/************************************************************/

const short scale1 = COB_FIELD_SCALE (f1);
const short scale2 = COB_FIELD_SCALE (f2);
/* numeric DISPLAY fields from a program are limited to COB_MAX_DIGITS,
internal created numeric DISPLAY fields from intrinsic functions
in intrinsic.c (cob_alloc_field, cob_decimal_move_temp) could
_theoretically_ use up to SHRT_MAX */
const short size1 = (short)COB_FIELD_SIZE (f1);
const short size2 = (short)COB_FIELD_SIZE (f2);
unsigned char *ptr_f1 = COB_FIELD_DATA (f1);
unsigned char *ptr_f2 = COB_FIELD_DATA (f2);

const int orig_sign1 = COB_GET_SIGN_ADJUST (f1);
const int orig_sign2 = COB_GET_SIGN_ADJUST (f2);

int ret;

const int sign1 = orig_sign1 >= 0 ? 1 :
check_all_zero (ptr_f1, size1) ? 1 : - 1;
const int sign2 = orig_sign2 >= 0 ? 1 :
check_all_zero (ptr_f2, size2) ? 1 : -1;

if (sign1 != sign2) {
ret = sign1;
goto over;
}
/************************************************************/
/* first we need to find the start and end of each field */
/* relative to the decimal point so that we can compare */
/* them correctly based on the position relative to the */
/* decimal point. */
/* */
/* the comparison of two fields containing display */
/* numeric data will result in one of three conditions */
/* */
/* one field will be a subset of the other. */
/* one field would intersect with the other. */
/* the fields would not intersect at all. */
/* */
/* so these are the 3 conditions we need to check for. */
/* this will result in 3 compares being needed. */
/************************************************************/

f1_strt_offset = size1 - scale1;
f1_end_offset = 0 - scale1;

f2_strt_offset = size2 - scale2;
f2_end_offset = 0 - scale2;

/************************************************************/
/* the first compare will be to check the leading / pad */
/* positions in either of the fields to check if they are */
/* all zeros. if not then we don't need to compare any */
/* farther, we are done. */
/************************************************************/


if (f1_strt_offset > f2_strt_offset) {
if (f2_strt_offset > f1_end_offset) {
if (!check_all_zero (ptr_f1, (f1_strt_offset - f2_strt_offset))) {
ret = sign1;
goto over;
}
ptr_f1 += (f1_strt_offset - f2_strt_offset);
f1_strt_offset -= (f1_strt_offset - f2_strt_offset);
} else {
if (!check_all_zero (ptr_f1, (f1_strt_offset - f1_end_offset))) {
ret = sign1;
goto over;
}
ptr_f1 += (f1_strt_offset - f1_end_offset);
f1_strt_offset -= (f1_strt_offset - f1_end_offset);
}
} else if (f2_strt_offset > f1_strt_offset) {
if (f1_strt_offset > f2_end_offset) {
if (!check_all_zero (ptr_f2, (f2_strt_offset - f1_strt_offset))) {
ret = -sign1;
goto over;
}
ptr_f2 += (f2_strt_offset - f1_strt_offset);
f2_strt_offset -= (f2_strt_offset - f1_strt_offset);
} else {
if (!check_all_zero (ptr_f2, (f2_strt_offset - f2_end_offset))) {
ret = -sign1;
goto over;
}
ptr_f2 += (f2_strt_offset - f2_end_offset);
f2_strt_offset -= (f2_strt_offset - f2_end_offset);
}
}

/************************************************************/
/* the second would be to check the overlapping positions */
/* in both fields against each other. if they do not */
/* match then we can set the return code to indicate */
/* which is greater / lesser and we are done. */
/************************************************************/

if (f1_strt_offset == f2_strt_offset) {
len = (f1_end_offset > f2_end_offset) ? (f1_strt_offset - f1_end_offset)
: (f2_strt_offset - f2_end_offset);
if (len > 0) {
compare_result = compare_data (ptr_f1, ptr_f2, len);
if (compare_result) {
ret = sign1 * compare_result;
goto over;
}
ptr_f1 += len;
f1_strt_offset -= len;
ptr_f2 += len;
f2_strt_offset -= len;
}
}

/************************************************************/
/* if we have not exited yet that means that one of the */
/* fields have trailing positions to the right of the */
/* overlapping positions, so we have to check those for */
/* all zeros. then we are done. */
/************************************************************/

if (f1_end_offset < f2_end_offset) {
if (!check_all_zero (ptr_f1, (f1_strt_offset - f1_end_offset))) {
ret = sign1;
goto over;
}
} else if (f2_end_offset < f1_end_offset) {
if (!check_all_zero (ptr_f2, (f2_strt_offset - f2_end_offset))) {
ret = -sign1;
goto over;
}
}
ret = 0;

over:
COB_PUT_SIGN_ADJUSTED (f1, orig_sign1);
COB_PUT_SIGN_ADJUSTED (f2, orig_sign2);
return ret;
}
# else

int
cob_numeric_display_cmp (cob_field *f1, cob_field *f2)
{
/* Fallback: internal decimal compare (most expensive) */
cob_decimal_set_field (&cob_d1, f1);
cob_decimal_set_field (&cob_d2, f2);
return cob_decimal_cmp (&cob_d1, &cob_d2);
}
#endif

/* compare USAGE DISPLAY numeric variable with no SIGN SEPARATE
and no invalid ebcdic overpunch to zero */
int
cob_numeric_display_cmp_zero (cob_field *f)
{
unsigned char *ptr = f->data;
unsigned char *ptr_end = ptr + f->size;

do {
const unsigned char p = COB_D2I (*ptr);
if (p != 0) {
const char p_sign = *(ptr_end - 1) & 0xF0;
int sign = p_sign == 0x70 || p_sign == 0xD0 || p_sign == 0xB0
? -1 : 1;
return sign;
}
ptr++;
} while (ptr < ptr_end);

return 0;
}

/* compare BCD field to zero */
int
cob_bcd_cmp_zero (cob_field *f)
{
unsigned char nullbuff[(COB_MAX_DIGITS / 2) + 1] = { 0 };

if (COB_FIELD_NO_SIGN_NIBBLE(f)) {
if (memcmp (f->data, nullbuff, f->size)) {
return 1;
}
} else {
if (memcmp (f->data, nullbuff, f->size - 1)) {
return cob_packed_get_sign (f) == -1 ? -1 : 1;
}
/* nonzero if byte with sign nibble has content in data nibble */
if ((*(f->data + f->size - 1) & 0xF0) != 0) {
return cob_packed_get_sign (f) == -1 ? -1 : 1;
}
}
return 0;
}

int
cob_numeric_cmp (cob_field *f1, cob_field *f2)
{
@@ -4045,6 +4288,15 @@ cob_numeric_cmp (cob_field *f1, cob_field *f2)
}
#endif

#ifndef NO_DISPLAY_COMPARE
/* do numeric display compare if possible */
if (f1_type == COB_TYPE_NUMERIC_DISPLAY
&& f2_type == COB_TYPE_NUMERIC_DISPLAY) {
/* CHECKME: if not float then create a temporary display2 */
return cob_numeric_display_cmp (f1, f2);
}
#endif

/* float needs special comparison */
if ( (f1_type >= COB_TYPE_NUMERIC_FLOAT
&& f1_type <= COB_TYPE_NUMERIC_L_DOUBLE)

0 comments on commit 47ffbd8

Please sign in to comment.