Author Topic: Calculation difference between QB64 64 and 32 bit versions (Read 2751 times)

SMcNeill · « **Reply #15 on:** October 02, 2021, 08:10:28 am »

And if you’re curious WHY your value overflows, check out this post here: https://www.qb64.org/forum/index.php?topic=4057.msg133955#msg133955

Quote

Looking at the storage for a Single var, which is 4 bytes which is 32 bits. Sure that doesn't fit.

Remember — with a floating point number, you’re not just storing the integer value itself, but also the sign, decimals, and exponent values…. Check the link above and see if it helps you break down the internal storage process a bit better.

(1 bit is for sign, 8 bits for exponent, 23 bits left for the rest for a SINGLE value.)

https://en.wikipedia.org/wiki/Single-precision_floating-point_format

luke · « **Reply #16 on:** October 02, 2021, 10:00:58 am »

Quote from: zaadstra on October 02, 2021, 07:26:46 am

Here's the part where I probably went wrong. I assumed that the calculation registers are big enough for 'everything' and the var to be stored in limits you to good or overflowed storage. The _Integer64 would be fine.
@SMcNeill 's explanation of 80bit FP for x86 and 64bit for x64 calculations confirms this idea. But the dark magic under the hood thinks different ;-)

So is the compiiler breaking up calculatings, and does it execute calcultations with Single var's in a smaller register? Hence the rounding as @luke illustrates? Just trying to understand, to be able to prevent these nasty coding errors.

I should emphasise that the results you're seeing with QB64-64 are the "correct" results, in that it's the expected value - so that won't be changing in the future unless someone can convince me otherwise. The intermediate mode of the FPU that Steve mentioned doesn't really come into play, since I'm running the calculations on a fixed development copy (technically a Linux version which never had the bug, but the point stands).

PRINT's decision to use scientific notation isn't really reflective of anything other than a formatting rule, so you can't read into it too much either way.

The rule for the arithmetic operations (+-*/) is basically that the value of lesser type is converted to the greater type, and the result is a value of the greater type. Here "lesser" and "greater" refer to an ordering of the types, which basically runs, from least to greatest: _byte, integer, long, _integer64, single, double, _float. One exception is that the / (division) operator always converts its arguments and returns a floating-point number, for obvious reasons. So in this small program:

Code: QB64: [Select]

Dim a As _Integer64
Dim b As Single
Print a - b
 

The result of the subtraction is of Single type.

It also turns out that because of the nature of floating point numbers, there are some numbers that can't be represented. For a Single, you have 24 bits (technically 23 + 1 implicit bit) that work as a regular value, so you can store every integer up to 16777216 (2^24). But beyond that you can only store some integers:

Code: QB64: [Select]

b = 16777216
Print Using "########"; b - 1
Print Using "########"; b
Print Using "########"; b + 1
Print Using "########"; b + 2

(Notice because b is a Single, the addition operation gives a Single result.) The number 16777217 simply can't be represented, so the result is rounded to a representable number. In this particular case we could fix the issue by making 1 a Double, which causes b to get converted to a Double and the result to be a Double:

Code: QB64: [Select]

 = 16777216
Print Using "########"; b + 1#

Now, it turns out that 31536000 can be stored in a Single... but the result of (tm_year - 70) * 31536000 can't. We can see this using the Print Using trick again:

Code: QB64: [Select]

tm_year = 1978
a = (tm_year - 70) * 31536000
Print Using " ###########"; a

Note that if you just do a straight-up "1908 * 31536000" you'll get a nonsense answer because the constants are of type Long and the result is bigger than will fit in a Long; using a Single or _Integer64 variable causes the type of the intermediate expressions to be greater.

It's coming into midnight (or 1am if you count the impending daylight saving change) and I'm not quite sure what the message of the post is anymore. Anyway, if you want to do maths with big numbers and can't accept some rounding, best to make everything an _Integer64 and it should all work out.

zaadstra · « **Reply #17 on:** October 02, 2021, 10:02:41 am »

Thanks for the examples, @SMcNeill.

To clarify my 'note' remark, this was about the new development version where you wrote the cpu FP would be the default, hence calculating slower. In that case I would rather select the (now default) SSE calcultations, when speed is needed.

Another question, why would you use the _FLOAT marks for a literal value 5## ? How does this help?

P.S. the progam in your link threw me an Illegal function in line 72 (at toggling the first OR the last red bit).
Am I correct that the explained examples only show 31 bits (including sign bit?) I would expect 32 bits. Now at least I see that Single var has a lot less bits than 32 :-)

zaadstra · « **Reply #18 on:** October 02, 2021, 10:28:28 am »

Thanks @luke for the late night answer.

I think that solves all mysteries, the order of variable types.

Looking at the QB64 data types table I always think that a (big) integer var is always bigger (greater) than a single. But it is not (it is lesser).
Finding this was the message of this post if you ask me, I've learned a lot.

SMcNeill · « **Reply #19 on:** October 03, 2021, 12:25:17 am »

Quote from: zaadstra on October 02, 2021, 10:02:41 am

Am I correct that the explained examples only show 31 bits (including sign bit?) I would expect 32 bits. Now at least I see that Single var has a lot less bits than 32 :-)

A SINGLE stores values as:
1-bit for sign.
8-bits for exponent (+127)
23-bits for the mantissa.

So, let’s look at a value for a number such as …. Umm.. 13, just for example.

13 in binary is: 1101, in standard format.

In binary, scientific notation, 13 is represented as 1.101E1 (I think… it’s midnight here and I’m doing the conversion in my head. Let’s pretend that’s right, if it’s not.)

Now that we know the formalized format of the binary value of 13, we can represent it as a SINGLE.

The first bit is 0, as 1 represents the minus sign.
The next 8-bits are 11111111, as they represent the binary value of 128 (our exponent of 1, plus 127)
The last 23 bits are our mantissa — the digits right of the decimal point. 101 (and 20 more 0’s)

13 decimal = 1101 in binary = 1.101E1 in formalized format.

0 - 11111111 - 10100000000000000000000000 would represent the 32-bits for how it’s stored in memory as a SINGLE value.

1 bit Sign +8 bit Exponent (plus 127) + 23 bit Mantissa (what’s right of the decimal point) = 32 bit SINGLE representation.

zaadstra · « **Reply #20 on:** October 03, 2021, 03:50:29 am »

I gues I should have been more specific. I counted the bits from your examples.

Code: Text: [Select]

And, the value for 3 would be:
0 --- 10000000 --- 1000000000000000000000
 
0100000001000000000000000000000  --> 31 bits
 
0 - 11111111 - 10100000000000000000000000    (example above)
 
01111111110100000000000000000000000  --> 35 bits

So you see I really worked out the examples!
How it works now is very clear, thanks for that. It has been a great learning weekend! :-)

SMcNeill · « **Reply #21 on:** October 03, 2021, 06:02:20 am »

When dealing with 20+ zeros in a row, it’s hard to count them all. LOL!

Ignore any extra bits beyond 32. :P You’ll learn in time, I’m good with the complicated stuff, but can’t count past 1, 3, 2, without messing up.

The concept is there, even if my 0s aren’t. 😂😂

Dimster · « **Reply #22 on:** October 03, 2021, 08:08:47 am »

Gawd there is so much that goes into this.. I'm not that strong in math. I do understand if I want the variable value to be 7 digits I want to declare that variable as a single, a double will give me 15 digits. So I guess my point is I'm surprised that it's just not that simple in the BASIC language. Why all that convolution and scientific notation when a layman/woman is just looking for a 7 digit answer. Now I appreciate where the math formula used to value the single variable is producing a number greater than 7 digits, there are rules of how to display the accuracy back to 7 digits using scientific notation but why not just truncate to the first 7 digits and call it a day. After all the programmer is just expecting a 7 digit value. Don't get me wrong, I sometimes do want more accuracy to a calculation but even if I'm trying to calculate pi I don't think I will ever be absolutely comfortable with the answer produced by BASIC math limitations.

SMcNeill · « **Reply #23 on:** October 03, 2021, 08:46:29 am »

A whole lot of it, @Dimster, is depending on how deep one wants to go into the rabbit hole for computing. Take 1 + 2 = 3…. That’s about as simple as it gets.

But how does the computer do it with switches that can only be ON (1) or OFF (0)?

It converts to binary format behind the scenes:

1 = 00000001
2 = 00000010
———————-
00000011

Then it basically adds those ONs (1s) and OFFs (0s) to produce the 00000011, as above.

And. In binary, 00000011 is the value of 3, in decimal.

If all you need to know is what 1 + 2 is, you can stop digging there and just use your calculator app and get 3 as the answer. It’s only if you’re wanting to know *exactly* what’s going on under the hood, that you need to break stuff down to binary addition and conversion.

Same way with working with decimal values. If you just need to store to 7 decimal points, knowing how to DIM variable AS SINGLE is good enough for that.

The vast majority of programmers never really HAVE to know *exactly* what format syntax is used to store those values internally, but I do wish more would take interest in learning at least enough about the subject to have a basic understanding of why 1/10 is a non-precise value.

All the discussion above is for those who want to delve more into internal data structures, than BASIC itself. If it’s not something you need for your personal projects, nor something you’re particularly interested in, then I’d just skim it, keep it in mind as a “it’s a little more complex than I first imagined” thought, and then I’d move on and not worry about it.

Consider it like playing a video file. I don’t need to know every detail about how an MP4 video is formatted, stored, synced, and captioned in multiple languages. I just need to know enough to double click on them and let my PC play them! I’ll leave it up to the guys who get paid to work on them, to sort out HOW the encoding and all works. It’s just one of those many “rabbit holes” of computer knowledge that I’ve never had to delve very deeply into yet. 😉

News:

Author Topic: Calculation difference between QB64 64 and 32 bit versions (Read 2751 times)

SMcNeill

Re: Calculation difference between QB64 64 and 32 bit versions

luke

Re: Calculation difference between QB64 64 and 32 bit versions

zaadstra

Re: Calculation difference between QB64 64 and 32 bit versions

zaadstra

Re: Calculation difference between QB64 64 and 32 bit versions

SMcNeill

Re: Calculation difference between QB64 64 and 32 bit versions

zaadstra

Re: Calculation difference between QB64 64 and 32 bit versions

SMcNeill

Re: Calculation difference between QB64 64 and 32 bit versions

Dimster

Re: Calculation difference between QB64 64 and 32 bit versions

SMcNeill

Re: Calculation difference between QB64 64 and 32 bit versions