Author Topic: QB64 _FLOAT and precision  (Read 3897 times)

0 Members and 1 Guest are viewing this topic.

Offline jack

  • Seasoned Forum Regular
  • Posts: 408
    • View Profile
QB64 _FLOAT and precision
« on: June 28, 2020, 09:10:56 pm »
QB64 set's the FPU control word to double precision at least part of the time, making operations with _float unreliable at best
here's a very short demo for Windows, place the c_include.h in the QB64 folder
c_include.h
Code: [Select]
#define __USE_MINGW_ANSI_STDIO 1
#include <stdio.h>
#include <math.h>

void qbsprintf (char *ResultString, char *format, long double *x)
{
sprintf(ResultString, format, *x);
}

int qbsscanf (long double *result, char *InputString)
{
return sscanf(InputString, "%Lf", result);
}

long double qbdiv(long double *x, long double *y)
{
return (*x)/(*y);
}

long double qbdivf(long double *x, long double *y)
{
long double z;
short oldcw, extended = 0x37f;
//set fpu control word to extended precision
asm (
"fstcw %[oldcw] \n"
"fldcw %[extended] \n"
:[oldcw]"=m"(oldcw),[extended]"=m"(extended)
:
:
);

z=(*x)/(*y);

//restore fpu control word
asm (
"fldcw %[oldcw] \n"
:[oldcw]"=m"(oldcw)
:
:
);
return z;
}
demo.bas
Code: [Select]
' c_include.h needs to be in QB64 folder
$CONSOLE:ONLY
_DEST _CONSOLE
DECLARE CUSTOMTYPE LIBRARY ".\qb_include"
    SUB qbsprintf (res AS STRING, frmt AS STRING, x AS _FLOAT)
    FUNCTION qbsscanf& (f AS _FLOAT, sfloat AS STRING)
    FUNCTION qbdiv## (x AS _FLOAT, y AS _FLOAT)
    FUNCTION qbdivf## (x AS _FLOAT, y AS _FLOAT)
END DECLARE

DIM x AS _FLOAT
DIM y AS _FLOAT
DIM z AS _FLOAT
DIM SHARED strout AS STRING

x = 5##
y = x / 9##
PRINT "QB64 division       "; print_strout(y)

s$ = "5L"
k& = qbsscanf(x, s$)
k& = qbsscanf(y, "9L")
z = qbdiv(x, y)
PRINT "C division low prec "; print_strout(z)
z = qbdivf(x, y)
PRINT "C division ext prec "; print_strout(z)

FUNCTION print_strout$ (x AS _FLOAT)
    strout = "                                            "
    CALL qbsprintf(strout, " %.20Lg", x)
    print_strout$ = strout
END FUNCTION
output
Code: [Select]
QB64 division        0.55555555555555558023
C division low prec  0.55555555555555558023
C division ext prec  0.55555555555555555556
I included the sscanf but it's probably not needed

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
Re: QB64 _FLOAT and precision
« Reply #1 on: June 28, 2020, 09:18:42 pm »
Haven't we covered all this in the past before, along with solutions to set various system settings if your OS defaults to something you find undesirable?

Quote
At the end of the day, QB64 just translates BAS code to C code.   mingw is the compiler we use to then compile that C code to an EXE.

GENERALLY SPEAKING:  *
G++ 32-bit uses the 80-bit precision X87 FPU math processors by default.
G++ 64-bit uses 64-bit precision SSE2 math processors by default, as they’re much faster.

That gives us a noticeable difference in results as the precision limits are different.  Usually this is a difference of something like 0.000000002 or such, and it’s hardly noticeable — BUT when rounding it can cause a huge change in values.

INT(15.9999999999999999) = 15
INT(16.0000000000000001) = 16

Only .0000000000000002 difference in those values, but their INT value is quite different.


* You notice I mentioned GENERALLY SPEAKING above??  That’s because various machines and OSes have different architecture that they default to.  From what I’ve heard, Mac OS X and up all use 64-bit SSE2 processing — even on 32-bit Macs...

If one wants to alter these type of default behaviors, they usually just need to set the proper flags to tell the compiler ”I want the slower, 80-bit FPU math, rather than the faster 64-bit SSE2 math”.

Quote
A few quick wiki links to help:

https://en.wikipedia.org/wiki/X87

https://en.wikipedia.org/wiki/SSE2

Quote
Differences between x87 FPU and SSE2
FPU (x87) instructions provide higher precision by calculating intermediate results with 80 bits of precision, by default, to minimise roundoff error in numerically unstable algorithms (see IEEE 754 design rationale and references therein). However, the x87 FPU is a scalar unit only whereas SSE2 can process a small vector of operands in parallel.

If codes designed for x87 are ported to the lower precision double precision SSE2 floating point, certain combinations of math operations or input datasets can result in measurable numerical deviation, which can be an issue in reproducible scientific computations, e.g. if the calculation results must be compared against results generated from a different machine architecture. A related issue is that, historically, language standards and compilers had been inconsistent in their handling of the x87 80-bit registers implementing double extended precision variables, compared with the double and single precision formats implemented in SSE2: the rounding of extended precision intermediate values to double precision variables was not fully defined and was dependent on implementation details such as when registers were spilled to memory.
« Last Edit: June 28, 2020, 09:22:58 pm by SMcNeill »
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
Re: QB64 _FLOAT and precision
« Reply #2 on: June 28, 2020, 09:27:51 pm »
Quote
look into the source itself and see if you can correct the issue as it occurs for your system.  The fix for you may be as simple as just using a different set of compiler options with your system, and you can find a list of those switches here: https://linux.die.net/man/1/g++

Some of the flags which seem like they may be relevant to your issue, seems to me to possibly be:

-mabi=ibmlongdouble: Change the current ABI to use IBM extended precision long double. This is a PowerPC 32-bit SYSV ABI option.

-mabi=ieeelongdouble: Change the current ABI to use IEEE extended precision long double. This is a PowerPC 32-bit Linux ABI option.

-mlong-double-64//-mlong-double-128: These switches control the size of "long double" type. A size of 64bit makes the "long double" type equivalent to the "double" type. This is the default.

-mhard-quad-float: Generate output containing quad-word (long double) floating point instructions.

-msoft-quad-float: Generate output containing library calls for quad-word (long double) floating point instructions.

-mno-align-double: Control whether GCC aligns "double", "long double", and "long long" variables on a two word boundary or a one word boundary. Aligning "double" variables on a two word boundary will produce code that runs somewhat faster on a Pentium at the expense of more memory.
On x86-64, -malign-double is enabled by default.

-mfpmath=387: Use the standard 387 floating point coprocessor present majority of chips and emulated otherwise. Code compiled with this option will run almost everywhere. The temporary results are computed in 80bit precision instead of precision specified by the type resulting in slightly different results compared to most of other chips.
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline jack

  • Seasoned Forum Regular
  • Posts: 408
    • View Profile
Re: QB64 _FLOAT and precision
« Reply #3 on: June 28, 2020, 10:01:14 pm »
@SMcNeill
I am not going to to read all the irrelevant information you posted, my code is short and simple and the output speaks for itself.
somewhere in a unknown QB64 library function the FPU is set to double, perhaps for performance
« Last Edit: June 28, 2020, 10:03:47 pm by jack »

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
Re: QB64 _FLOAT and precision
« Reply #4 on: June 28, 2020, 10:27:18 pm »
@SMcNeill
I am not going to to read all the irrelevant information you posted, my code is short and simple and the output speaks for itself.
somewhere in a unknown QB64 library function the FPU is set to double, perhaps for performance

If you're not going to read the responses folks give you, why even bother to post at all? 

The issue you're seeing is system-specific and not something which QB64 can set a flag for, for all systems.  It's just like setting a global -no-pie flag for Linux users -- for some, it corrects the problem with their programs not being openable via clicking on them.  For others, it screws up their compiler and generates compilation errors.  The user needs to set the flag if their system requires it, manually.

All I can say at this point is: Float doesn't work as you want.  Sorry.  I honestly doubt anyone is going to "fix" it anytime soon, so you'll just have to find a workaround for your programs.  That may require you to use string math, a different language, or learning to ignore _FLOAT and make do with DOUBLE instead.

Good luck finding what works for you, since none of the compiler options/flags seem to make a difference.
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline Cobalt

  • QB64 Developer
  • Forum Resident
  • Posts: 878
  • At 60 I become highly radioactive!
    • View Profile
Re: QB64 _FLOAT and precision
« Reply #5 on: June 29, 2020, 12:33:15 am »
Pretty sure you can run the same code on different machines and they will all give you different results, depending on CPU, OS, BIOS, even RAM amount and type.

If you have found something that works for you and your system that returns the results you want, Super! don't be rude when someone else points out that the horse is not only already beaten and dead but cremated too.

That is an interesting solution though, how did you come by creating that code?
Granted after becoming radioactive I only have a half-life!

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
Re: QB64 _FLOAT and precision
« Reply #6 on: June 29, 2020, 02:43:43 am »
Here's a simple solution which works.

First, the c_include:

Code: [Select]
void set_dpfpu() { unsigned int mode = 0x37F; asm ("fldcw %0" : : "m" (*&mode));}
void set_qbfpu() { unsigned int mode = 0x27F; asm ("fldcw %0" : : "m" (*&mode));}

Copy and save the above as "c_include.h" in your QB64 folder.

Then the QB64 code to test:

Code: QB64: [Select]
  1. ' c_include.h needs to be in QB64 folder
  2. DECLARE CUSTOMTYPE LIBRARY ".\c_include"
  3.     SUB set_dpfpu 'to toggle to double precision floating point math
  4.     SUB set_qbfpu 'to toggle back to what most folks will see with QB64 default math
  5.  
  6.  
  7.  
  8. 'Let's print our results without screwing with anything first.
  9. x = 5##
  10. y = x / 9##
  11. PRINT USING "QB64 division       #.####################"; y
  12.  
  13.  
  14. 'Set the double precision math
  15. set_dpfpu
  16. x = 5##
  17. y = x / 9##
  18. PRINT USING "QB64 division       #.####################"; y
  19.  
  20. 'Set the QB64 precision math
  21. set_qbfpu
  22. x = 5##
  23. y = x / 9##
  24. PRINT USING "QB64 division       #.####################"; y
  25.  

I won't swear that it'll work on everybody's system, but it works on mine at least.
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!