Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Float rounding error: 0.95 + 0.05 = 0 in some cases #195

Closed
dansanderson opened this issue Feb 13, 2025 · 9 comments
Closed

Float rounding error: 0.95 + 0.05 = 0 in some cases #195

dansanderson opened this issue Feb 13, 2025 · 9 comments
Assignees
Labels
bug Something isn't working

Comments

@dansanderson
Copy link
Collaborator

Test Environment (required)
You can use MEGA65INFO to retrieve this.

  • Platform: MEGA65
  • ROM Release: 920395, 920411
  • Core Commit: 3c10488

Describe the bug

10 FOR I=0 TO 1.1 STEP 0.05
20 PRINT I
30 NEXT I

outputs "0" instead of "1" in the list:

...
0.85
0.9
0.95
0
1.05
1.1

Note that replacing line 20 with PRINT I*10 still shows 0 in that spot, so it might not just be rounding incorrectly during PRINT.

It's not specific to FOR STEP. This fails in the same way:

10 I=0
20 PRINT I
30 IF I>1.1 THEN END
40 I=I+0.05
50 GOTO 20

PRINT 0.95+0.05 outputs the correct result.

@dansanderson dansanderson added new New report, not classified yet bug Something isn't working and removed new New report, not classified yet labels Feb 13, 2025
@dansanderson
Copy link
Collaborator Author

dansanderson commented Feb 13, 2025

Start at 0.8 and increment by 0.05 and it'll hit 1 correctly. Floating point: ex=80 ma=7ffffffe -> "1"

Start at 0.75 and increment by 0.05 and it trips when it should be 1. Floating point: ex=80 ma=7ffffffc -> "0"

As expected, this explains why it's able to continue to 1.05: it's accruing 0.05 steps using the inherently inaccurate floating point format, then mis-handling a narrow range of mantissa values.

10 bank 0
20 i=0.75
30 a=pointer(i)
40 print i;" ",
50 for x=0 to 4:print right$(hex$(peek(a+x)),2);:next x
60 print
70 if i>1.1 then end
80 i=i+0.05
90 goto 40

@dansanderson
Copy link
Collaborator Author

A simpler way to repro the issue, for faster test iteration:

?.75+.05+.05+.05+.05+.05

Expected: 1
Actual: 0

This specific example stopped working in ROM 920177. (Whether earlier ROMs had float interpretation issues, I don't know yet.) The specific change for this release was not committed to the repo. All changes from 920173 to 920183 are present in this commit: https://github.com/MEGA65/mega65-rom/commit/8b5b60d3b6b5587bdff409861a3315bd674a6b70 Many source formatting changes were introduced in this range, making it difficult to glean what parts of the code have introduced the incorrect behavior.

@dansanderson
Copy link
Collaborator Author

dansanderson commented Feb 15, 2025

In ROM 920176, the float representation test program above (where I'm poking into the I variable) produces the same result. This may suggest that the error is not in the float representation, but in the handling of the result. Notice that the *10 result is also still the same: values roughly interpreted as 0 become 10 after this operation.

It appears something has changed slightly with regards to float mantissa rounding, but a discontinuity where a few increasing mantissa values just before "1" are interpreted as "0" is likely wrong.

@alochmann
Copy link

Maybe connected:
PRINT 0.999999995
(eight 9s) outputs to 0 instead of 1. This hints at a problem with PRINT, rather than addition. (Tried on an older ROM, version 920395, without fixed INT.)

@dansanderson
Copy link
Collaborator Author

Agreed, I assume that's the issue. Multiplying the slightly-off float by 10 (decimal) appears to also result in a 0, but that might be a coincidence that's still accounted for by the decimal conversion in PRINT.

@dansanderson dansanderson self-assigned this Feb 16, 2025
@dansanderson
Copy link
Collaborator Author

Oh crap it's INT again. 😂

The breaking change https://github.com/MEGA65/mega65-rom/commit/8b5b60d3b6b5587bdff409861a3315bd674a6b70 made only one change to fout (which computes the decimal string of the FAC), and it was to replace the old qint with the new facint in the step of the algorithm that rounds after nine decimal places beyond the decimal point. Specifically:

  1. FAC = FAC x 1e9
  2. While FAC < 1e8, FAC = FAC x 10
  3. While FAC >= 1e9, FAC = FAC / 10
  4. Add 0.5 to round up
  5. facint
    ...

@dansanderson
Copy link
Collaborator Author

My apologies, I made a dumb mistake in my test data above: I got the mantissa byte ordering wrong. :P I deleted the errors from my previous messages to avoid confusion.

In the "compact" five-byte format used for float variables, bit 7 of the least significant byte is the sign bit. With mantissa bytes ordered from MSB to LSB, the erroneous +0.05 sequence looks like this:

Printed value EX M4M3M2M1
0.05 7C CCCCCC4C
0.75 80 00000040
0.8 80 CCCCCC4C
0.85 80 98999959
0.9 80 64666666
0.95 80 30333373
"0" 80 FCFFFF7F
1.05 81 64666606
1.1 81 CACCCC0C
1.15 81 30333313

Incrementing the mantissa by one up to 80 FCFFFF7F shows that nearby values print correctly:

Printed value EX M4M3M2M1
0.98828125 80 FCFFFF7C
0.9921875 80 FCFFFF7D
0.99609375 80 FCFFFF7E
"0" 80 FCFFFF7F
1.0000001 81 FD000000

Sure enough, with FAC set to 80 FCFFFFFF (with the sign bit in a separate byte), fout generates the string " 0". 80 FCFFFFFE generates " 0.99609375". This is slightly different because variable storage clobbers bit 7 of the lsb for the sign bit, which is weird because it's not actually the least significant bit, but whatever.

Stepping through fout with FAC set to 80 FCFFFFFF, everything looks OK until it tries to find the decimal point in rangefix. Either it's starting with an incorrect deccnt value (which helps it find the decimal point) or rangefix is breaking. Just before rangefix, formats = " 1000000000" (correct) and deccnt = 0 (maybe wrong?). It prints "1" correctly in most other cases, so my current guess is something is odd with how fout handles deccnt.

@dansanderson
Copy link
Collaborator Author

dansanderson commented Feb 17, 2025

Now that I know what I'm doing (sorta), I can confirm that in ROM 920176, setting a variable's memory to 80 FCFFFF7F and printing it does print the erroneous "0". In other words, fout has always printed this incorrectly, it's just that a change in ROM 920177 has affected float rounding, and adding 0.05 multiple times now produces a slightly different mantissa that causes fout to produce the wrong output.

As above, I will assume for now that fout ought to represent every mantissa value without discontinuity, and that fout needs to be repaired for an input of 80 FCFFFF7F such that it returns the string "1", as opposed to attempting to roll back to ROM 920176's rounding behavior.

@dansanderson
Copy link
Collaborator Author

dansanderson commented Feb 17, 2025

I fixed a fault with rangefix that was causing ?.75+.05+.05+.05+.05+.05 to render as " 0". This now renders correctly as " 1".

The fout routine converts floating point values to decimal strings, and is meant to accommodate a small amount of inaccuracy in the floating point representation using a couple of rounding steps. These steps were correctly catching the compounding error of the additions in .75+.05+.05+.05+.05+.05, but the difference in rounding introduced in ROM 920177 created different preconditions for rangefix that triggered a bug in how the decimal point was drawn into the final string.

  • ?1: FAC = exponent $81, mantissa $80000000. BCD string: " 0100000000" Decimal point position deccnt = 2. " 01.00000000" normalizes to the final result of " 1".
  • ?.75+.05+.05+.05+.05+.05: FAC = exponent $80, mantissa $FCFFFF7F. BCD string: " 1000000000" Decimal point position deccnt = 0. The bug caused rangefix to overwrite the "1" digit with the decimal point: " 0.00000000" normalizes to " 0".

The change that caused this was a change to float rounding introduced in ROM 920177. Consider this test program:

10 I=-1
20 PRINT I
30 I=I+0.01
40 IF I<0.1 THEN 20

C64, C65, and MEGA65 all miss the 0 due to compounding float inaccuracies, and all software engineers are taught to expect this from floats. It is still notable that the compounding error is just a teeny bit worse after ROM 920177. I don't yet believe there's a reason to try and get the rounding behavior to match exactly, but it's worth remembering in case we discover any other odd behaviors that may have escaped scrutiny.

The good news is that in this case, I'm confident that rangefix had a real bug with a simple fix. It should never be clobbering digits when it's just trying to insert a decimal point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants