QB64.org Forum

Active Forums => QB64 Discussion => Topic started by: bplus on February 23, 2020, 10:22:15 am

Title: How is a QB64 exe file made?
Post by: bplus on February 23, 2020, 10:22:15 am
I read this discription from Aurel at Snytax Bomb:
https://www.syntaxbomb.com/index.php/topic,6677.msg347040096.html#msg347040096

Quote
In fact both of them are interpreters with one difference.
Qb64 compile into bytecode and then is this bytecode binded or added to qb64 runtime interpreter which form
one exe file as standalone, so looks like is compiled into machine code.

This is how SdlBasic works and maybe Just Basic because it Tokenizes file for .exe but I am under the impression QB64 is completely compiled through C+ into .exe before it can be run at all.
Title: Re: How is a QB64 exe file made?
Post by: STxAxTIC on February 23, 2020, 10:36:23 am
Aurel is talking through his nose.
Title: Re: How is a QB64 exe file made?
Post by: bplus on February 23, 2020, 10:41:31 am
Thanks, is my description accurate (for it's length)?

I don't want to misrepresent or leave something major unsaid about QB64 specially at another forum.
Title: Re: How is a QB64 exe file made?
Post by: STxAxTIC on February 23, 2020, 10:44:05 am
Being a math guy, I'd say the shortest representation of qb64 is an equation:

Code: QB64: [Select]
  1. QB64 = BASIC + _GL => C++
Title: Re: How is a QB64 exe file made?
Post by: bplus on February 23, 2020, 10:45:49 am
Being a math guy, I'd say the shortest representation of qb64 is an equation:

Code: QB64: [Select]
  1. QB64 = BASIC + _GL => C++

LOL luv it, so it's C++, two plus signs not one?

Update: Yes I just did a check on Internet C+ is a grade. :)
Title: Re: How is a QB64 exe file made?
Post by: Aurel on February 23, 2020, 11:02:15 am
Quote
Aurel is talking through his nose.
That is a nice to know ..heh.
And why that be secret or i don't know what else.
Whole Java is based on bytecode iterpreter, and also many other BASIC dialects.
So i don't see any problem with that.
Title: Re: How is a QB64 exe file made?
Post by: luke on February 23, 2020, 04:59:19 pm
There's no bytecode to be seen here: how fancy do you think we are?

QB64 just reads the source file line by line and generates corresponding C++ code directly using something not entirely unlike
Code: [Select]
IF inputelements$(1) = "DRAW" THEN
    PRINT #outfile, "sub_draw(";
    ' do any arguments
    PRINT #outfile, ");"
END IF
And continues on for each statement.

Then you just run that through g++/clang and link it with some runtime libraries, et voilĂ .
Title: Re: How is a QB64 exe file made?
Post by: bplus on February 23, 2020, 09:10:07 pm
(I am) Not really qualified to judge best answer but if Luke doesn't know...
Title: Re: How is a QB64 exe file made?
Post by: FellippeHeitor on February 23, 2020, 09:11:33 pm
Luke just described the whole process, what doesn't he know?
Title: Re: How is a QB64 exe file made?
Post by: bplus on February 23, 2020, 09:15:36 pm
But IF Luke doesn't know THEN we are in deep do-do!
Title: Re: How is a QB64 exe file made?
Post by: FellippeHeitor on February 23, 2020, 09:21:27 pm
Aaaah.
Title: Re: How is a QB64 exe file made?
Post by: Pete on February 24, 2020, 02:50:03 am
And all the while I thought it was done with unicorns and pixie dust. Son of a batch! Oh well. I guess if you're looking for a horny horse left in the dust, you need FreeBASIC for that.

Pete
Title: Re: How is a QB64 exe file made?
Post by: romichess on February 24, 2020, 03:53:03 am
But IF Luke doesn't know THEN we are in deep do-do!

I just barely made it out of deep do-do a couple of hours ago. I heard water running and after checking the inside of the house for a broken pipe and not finding one I went outside to check the faucet. I never made it to the faucet. The snow was too deep and I fell down and could not get back on my feet. I had to crawl through the snow to get to the driveway and I barely made it. I was almost too weak to stand but somehow made it to my feet. I was extreme hyperventilating and my heart was pounding furiously in my chest. I thought that I was going to pass out. But, somehow I made it into the house and to my bed where I stayed for probably over an hour. Then I got up and went down into the basement to turn off the water. Then of course I sat down in my chair, turned on my computer and visited one of my favorite internet sites, this one! One must have priorities. :)

Concerning the OP and the question of pure interpreter, compiled to bytecode or compiled to native cpu code either jit or to exe before there was never any question in my mind. QB64 executes way to fast to be any type of interpreter. The only question for me is who can write faster C++ code, QB64 or myself. That is complicated by the fact that if the basic code is not fast then the C++ code is also most likely not going to be fast.

I personally do not trust compilers and with good reason. One optimized C routine I wrote took iirc about 23 machine language instructions. I coded it using only 6 machine instructions. Just last week I wrote a function in C++ that looked fast but it confused the bejeebers out of all tested compilers except one. The current experimental Clang compiler was able to handle it quite well. 

The original C++ function is a new way of computing 64 bit bitboards for move generation in chess. The name is Split Index Super Set Yielding bitboards. Or for short, SISSY bitboards.
Code: QB64: [Select]
  1. ' The C++ function
  2. case WQUEEN:
  3.         h->moves[id] = qss[fs][occ.b08.rank1][0]
  4.                      & qss[fs][occ.b08.rank2][1]
  5.                      & qss[fs][occ.b08.rank3][2]
  6.                      & qss[fs][occ.b08.rank4][3]
  7.                      & qss[fs][occ.b08.rank5][4]
  8.                      & qss[fs][occ.b08.rank6][5]
  9.                      & qss[fs][occ.b08.rank7][6]
  10.                      & qss[fs][occ.b08.rank8][7];
  11.  
  12. ' Clang experimental
  13. queenAttacks(int, unsigned long long):                    
  14. .Lfunc_begin0:
  15.         push    rbx
  16. .Ltmp0:
  17.         mov     rdx, rsi
  18.         mov     rcx, rsi
  19.         mov     r11, rsi
  20.         mov     r10, rsi
  21.         mov     r9, rsi
  22.         mov     r8, rsi
  23.         movzx   ebx, sil
  24.         mov     rax, rsi
  25. .Ltmp1:
  26.         shr     rax, 2
  27. .Ltmp2:
  28.         shr     rdx, 10
  29.         shr     rcx, 18
  30.         shr     r11, 26
  31.         shr     r10, 34
  32.         movsxd  rsi, edi
  33.         shl     rbx, 6
  34.         shl     rsi, 14
  35.         and     eax, 16320
  36.         mov     rax, qword ptr [rsi + rax + qss+8]
  37.         and     rax, qword ptr [rsi + rbx + qss]
  38.         shr     r9, 42
  39.         and     edx, 16320
  40.         and     rax, qword ptr [rsi + rdx + qss+16]
  41.         shr     r8, 50
  42.         and     ecx, 16320
  43.         and     rax, qword ptr [rsi + rcx + qss+24]
  44.         and     r8d, -64
  45.         and     r11d, 16320
  46.         and     rax, qword ptr [rsi + r11 + qss+32]
  47.         and     r10d, 16320
  48.         and     rax, qword ptr [rsi + r10 + qss+40]
  49.         and     r9d, 16320
  50.         and     rax, qword ptr [rsi + r9 + qss+48]
  51.         and     rax, qword ptr [rsi + r8 + qss+56]
  52.         pop     rbx
  53.         ret
  54.  
  55. ' My handwritten assembler
  56.  
  57. _DATA SEGMENT
  58.  
  59. bbs STRUCT
  60. r1 BYTE ?
  61. r2 BYTE ?
  62. r3 BYTE ?
  63. r4 BYTE ?
  64. r5 BYTE ?
  65. r6 BYTE ?
  66. r7 BYTE ?
  67. r8 BYTE ?
  68. bbs ENDS
  69.  
  70. bbu UNION
  71. bbs<>
  72. b64 QWORD ?
  73. bbu ENDS
  74.  
  75. occ bbu<>
  76.  
  77. _DATA ENDS
  78.  
  79. _TEXT SEGMENT
  80.  
  81. RayAttacks PROC
  82.  
  83. ; rcx = sq
  84. ; rdx = address of rss
  85. ; r8 = occ
  86.  
  87. shl rcx, 11 ; sq * 2048
  88. mov occ.b64, r8
  89. add rdx, rcx
  90. movzx r8, occ.r1
  91. movzx r9, occ.r2
  92. mov rax, [rdx + r8 * 8]
  93. mov rcx, [rdx + r9 * 8 + 131072]
  94. movzx r8, occ.r3
  95. movzx r9, occ.r4
  96. and rax, [rdx + r8 * 8 + (2 * 131072)]
  97. and rcx, [rdx + r9 * 8 + (3 * 131072)]
  98. movzx r8, occ.r5
  99. movzx r9, occ.r6
  100. and rax, [rdx + r8 * 8 + (4 * 131072)]
  101. and rcx, [rdx + r9 * 8 + (5 * 131072)]
  102. movzx r8, occ.r7
  103. movzx r9, occ.r8
  104. and rax, [rdx + r8 * 8 + (6 * 131072)]
  105. and rcx, [rdx + r9 * 8 + (7 * 131072)]
  106. and rax, rcx
  107. ret
  108.  
  109. RayAttacks ENDP
  110.  
  111. TEXT ENDS
  112.  
  113.  
  114.  

I'd put my hand written assembler up against Clang any day of the week.

This was just in case there is anyone that would be interested in the workings behind the scene.
Title: Re: How is a QB64 exe file made?
Post by: bplus on February 24, 2020, 12:59:49 pm
Quote
Then of course I sat down in my chair, turned on my computer and visited one of my favorite internet sites, this one! One must have priorities. :)

LOL thank goodness we can laugh about this now.

hmm... love Basic and assembler, Basic + assembler...

@romichess
You might be interested in MasmBasic developed by jj2007 (I met at Retro (RIP) and member at FreeBasic forum)
 https://retrobasic.allbasic.info/index.php?topic=358.msg2397#msg2397
links still work.
Title: Re: How is a QB64 exe file made?
Post by: romichess on February 24, 2020, 03:58:45 pm
LOL thank goodness we can laugh about this now.

hmm... love Basic and assembler, Basic + assembler...

@romichess
You might be interested in MasmBasic developed by jj2007 (I met at Retro (RIP) and member at FreeBasic forum)
 https://retrobasic.allbasic.info/index.php?topic=358.msg2397#msg2397
links still work.

Just waiting on the plumber to show up. I like the idea of a basic that outputs assembler. That way after having a working program I could rewrite sections one at a time for better performance. But for me and my chess programming goals only 64 bit assembler will do. Thanks for the link though!
Title: Re: How is a QB64 exe file made?
Post by: _vince on February 24, 2020, 08:18:05 pm
I like the idea of a basic that outputs assembler. That way after having a working program I could rewrite sections one at a time for better performance.
freebasic can output the full asm file right before it gets assembled, but it also has inline assembly support so you'd never care to modify it anyway (Not that you should ever try to hand optimize modern asm)

Here's an excerpt:
Code: [Select]
const sw = 800
const sh = 600
dim shared as double pi = 2*asin(1)

screenres sw, sh, 32

i=0
pset (200*cos(2*pi*i/5) + sw/2, sh/2 - 200*sin(2*pi*i/5))
for i=0 to 5
line -(200*cos(2*pi*i*2/5) + sw/2, sh/2 - 200*sin(2*pi*i*2/5))
next

sleep
system

excerpt from the .asm file
Code: [Select]
main:
.LFB0:
.file 1 "star.bas"
.loc 1 1 1
.cfi_startproc
push rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
mov rbp, rsp
.cfi_def_cfa_register 6
sub rsp, 48
mov DWORD PTR -36[rbp], edi
mov QWORD PTR -48[rbp], rsi
.loc 1 1 1
mov rax, QWORD PTR fs:40
mov QWORD PTR -8[rbp], rax
xor eax, eax
.loc 1 1 2
mov DWORD PTR -20[rbp], 0
.loc 1 1 2
mov QWORD PTR -16[rbp], 0
.loc 1 1 2
mov rcx, QWORD PTR -48[rbp]
mov eax, DWORD PTR -36[rbp]
mov edx, 2
mov rsi, rcx
mov edi, eax
call fb_Init@PLT
.L2:
.loc 1 6 2
mov r9d, 0
mov r8d, 0
mov ecx, 1
mov edx, 32
mov esi, 600
mov edi, 800
call fb_GfxScreenRes@PLT
.loc 1 8 6
mov QWORD PTR -16[rbp], 0
.loc 1 9 157
mov rax, QWORD PTR -16[rbp]
cvtsi2sd xmm1, rax
.loc 1 9 155
movsd xmm0, QWORD PTR PI$[rip]
mulsd xmm0, xmm1
.loc 1 9 170
addsd xmm0, xmm0
.loc 1 9 133
movsd xmm1, QWORD PTR .LC0[rip]
divsd xmm0, xmm1
call sin@PLT
movapd xmm1, xmm0
.loc 1 9 195
movsd xmm0, QWORD PTR .LC1[rip]
mulsd xmm1, xmm0
.loc 1 9 207
movsd xmm0, QWORD PTR .LC2[rip]
subsd xmm0, xmm1
.loc 1 9 2
cvtsd2ss xmm4, xmm0
movss DWORD PTR -40[rbp], xmm4
.loc 1 9 60
mov rax, QWORD PTR -16[rbp]
cvtsi2sd xmm1, rax
.loc 1 9 58
movsd xmm0, QWORD PTR PI$[rip]
mulsd xmm0, xmm1
.loc 1 9 73
addsd xmm0, xmm0
.loc 1 9 36
movsd xmm1, QWORD PTR .LC0[rip]
divsd xmm0, xmm1
call cos@PLT
movapd xmm1, xmm0
.loc 1 9 98
movsd xmm0, QWORD PTR .LC1[rip]
mulsd xmm1, xmm0
.loc 1 9 110
movsd xmm0, QWORD PTR .LC3[rip]
addsd xmm0, xmm1
.loc 1 9 2
cvtsd2ss xmm0, xmm0
mov ecx, 0
mov edx, -2147483644
mov esi, 0
movss xmm1, DWORD PTR -40[rbp]
mov edi, 0
call fb_GfxPset@PLT
.loc 1 10 7
mov QWORD PTR -16[rbp], 0
.L3:
.loc 1 11 177
mov rax, QWORD PTR -16[rbp]
cvtsi2sd xmm1, rax
.loc 1 11 175
movsd xmm0, QWORD PTR PI$[rip]
mulsd xmm1, xmm0
.loc 1 11 190
movsd xmm0, QWORD PTR .LC4[rip]
mulsd xmm0, xmm1
.loc 1 11 153
movsd xmm1, QWORD PTR .LC0[rip]
divsd xmm0, xmm1
call sin@PLT
movapd xmm1, xmm0
.loc 1 11 215
movsd xmm0, QWORD PTR .LC1[rip]
mulsd xmm1, xmm0
.loc 1 11 227
movsd xmm0, QWORD PTR .LC2[rip]
subsd xmm0, xmm1
.loc 1 11 4
cvtsd2ss xmm5, xmm0
movss DWORD PTR -40[rbp], xmm5
.loc 1 11 80
mov rax, QWORD PTR -16[rbp]
cvtsi2sd xmm1, rax
.loc 1 11 78
movsd xmm0, QWORD PTR PI$[rip]
mulsd xmm1, xmm0
.loc 1 11 93
movsd xmm0, QWORD PTR .LC4[rip]
mulsd xmm0, xmm1
.loc 1 11 56
movsd xmm1, QWORD PTR .LC0[rip]
divsd xmm0, xmm1
call cos@PLT
movapd xmm1, xmm0
.loc 1 11 118
movsd xmm0, QWORD PTR .LC1[rip]
mulsd xmm1, xmm0
.loc 1 11 130
movsd xmm0, QWORD PTR .LC3[rip]
addsd xmm0, xmm1
.loc 1 11 4
cvtsd2ss xmm0, xmm0
mov r8d, -2147483646
mov ecx, 65535
mov edx, 0
mov esi, 0
movss xmm3, DWORD PTR -40[rbp]
movaps xmm2, xmm0
pxor xmm1, xmm1
pxor xmm0, xmm0
mov edi, 0
call fb_GfxLine@PLT
.L4:
.loc 1 12 13
mov rax, QWORD PTR -16[rbp]
add rax, 1
.loc 1 12 7
mov QWORD PTR -16[rbp], rax
.L5:
.loc 1 12 11
mov rax, QWORD PTR -16[rbp]
.loc 1 12 5
cmp rax, 5
jg .L10
.loc 1 12 20 discriminator 2
jmp .L3
.L10:
.loc 1 12 3
nop
.L6:
.loc 1 14 2
mov edi, -1
call fb_Sleep@PLT
.loc 1 15 2
mov edi, 0
call fb_End@PLT
.L7:
.loc 1 15 2
mov edi, 0
call fb_End@PLT
.loc 1 15 9
mov eax, DWORD PTR -20[rbp]
.loc 1 15 1
mov rdx, QWORD PTR -8[rbp]
xor rdx, QWORD PTR fs:40
je .L9
call __stack_chk_fail@PLT
.L9:
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc

It can also output a C source file, it is a sort of 'low level C' that allows freebasic to be multiplatform, though im not an expert on freebasic internals, quite interesting though
Code: [Select]
typedef   signed char       int8;
typedef unsigned char      uint8;
typedef   signed short      int16;
typedef unsigned short     uint16;
typedef   signed int        int32;
typedef unsigned int       uint32;
typedef   signed long long  int64;
typedef unsigned long long uint64;
typedef struct { char *data; int64 len; int64 size; } FBSTRING;
typedef int8 boolean;
#line 15 "star.bas"
void fb_GfxPset( void*, float, float, uint32, int32, int32 );
#line 15 "star.bas"
void fb_GfxLine( void*, float, float, float, float, uint32, int32, uint32, int32 );
#line 15 "star.bas"
int32 fb_GfxScreenRes( int32, int32, int32, int32, int32, int32 );
#line 15 "star.bas"
void fb_Init( int32, uint8**, int32 );
#line 15 "star.bas"
void fb_End( int32 );
#line 15 "star.bas"
void fb_End( int32 );
#line 15 "star.bas"
void fb_Sleep( int32 );
#line 15 "star.bas"
static double PI$ = 0x1.921FB54442D18p+1;

#line 1 "star.bas"
int32 main( int32 __FB_ARGC__$0, char** __FB_ARGV__$0 )
#line 1 "star.bas"
{
#line 1 "star.bas"
int32 fb$result$0;
#line 1 "star.bas"
__builtin_memset( &fb$result$0, 0, 4ll );
#line 1 "star.bas"
int64 I$0;
#line 1 "star.bas"
__builtin_memset( &I$0, 0, 8ll );
#line 1 "star.bas"
fb_Init( __FB_ARGC__$0, (uint8**)__FB_ARGV__$0, 2 );
#line 1 "star.bas"
label$0:;
// #lang "fblite"
// const sw = 800
// const sh = 600
// dim shared as double pi = 2*asin(1)
// screenres sw, sh, 32
#line 6 "star.bas"
fb_GfxScreenRes( 800, 600, 32, 1, 0, 0 );
// i=0
#line 8 "star.bas"
I$0 = 0ll;
// pset (200*cos(2*pi*i/5) + sw/2, sh/2 - 200*sin(2*pi*i/5))
#line 9 "star.bas"
fb_GfxPset( (void*)0ull, (float)((__builtin_cos( (((PI$ * (double)I$0) * 0x1.p+1) / 0x1.4p+2) ) * 0x1.9p+7) + 0x1.9p+8), (float)(-(__builtin_sin( (((PI$ * (double)I$0) * 0x1.p+1) / 0x1.4p+2) ) * 0x1.9p+7) + 0x1.2Cp+8), 0u, -2147483644, 0 );
// for i=0 to 5
{
#line 10 "star.bas"
I$0 = 0ll;
#line 10 "star.bas"
label$5:;
{
// line -(200*cos(2*pi*i*2/5) + sw/2, sh/2 - 200*sin(2*pi*i*2/5))
#line 11 "star.bas"
fb_GfxLine( (void*)0ull, 0x0p+0f, 0x0p+0f, (float)((__builtin_cos( (((PI$ * (double)I$0) * 0x1.p+2) / 0x1.4p+2) ) * 0x1.9p+7) + 0x1.9p+8), (float)(-(__builtin_sin( (((PI$ * (double)I$0) * 0x1.p+2) / 0x1.4p+2) ) * 0x1.9p+7) + 0x1.2Cp+8), 0u, 0, 65535u, -2147483646 );
// next
}
#line 12 "star.bas"
label$3:;
#line 12 "star.bas"
I$0 = I$0 + 1ll;
#line 12 "star.bas"
label$2:;
#line 12 "star.bas"
if( I$0 <= 5ll ) goto label$5;
#line 12 "star.bas"
label$4:;
}
// sleep
#line 14 "star.bas"
fb_Sleep( -1 );
// system
#line 15 "star.bas"
fb_End( 0 );
#line 15 "star.bas"
label$1:;
#line 15 "star.bas"
fb_End( 0 );
#line 15 "star.bas"
return fb$result$0;
#line 15 "star.bas"
}
Title: Re: How is a QB64 exe file made?
Post by: romichess on February 24, 2020, 09:26:11 pm
freebasic can output the full asm file right before it gets assembled, but it also has inline assembly support so you'd never care to modify it anyway (Not that you should ever try to hand optimize modern asm)

Here's an excerpt:
Code: [Select]
const sw = 800
const sh = 600
dim shared as double pi = 2*asin(1)

screenres sw, sh, 32

i=0
pset (200*cos(2*pi*i/5) + sw/2, sh/2 - 200*sin(2*pi*i/5))
for i=0 to 5
line -(200*cos(2*pi*i*2/5) + sw/2, sh/2 - 200*sin(2*pi*i*2/5))
next

sleep
system

excerpt from the .asm file
Code: [Select]
main:
.LFB0:
.file 1 "star.bas"
.loc 1 1 1
.cfi_startproc
push rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
mov rbp, rsp
.cfi_def_cfa_register 6
sub rsp, 48
mov DWORD PTR -36[rbp], edi
mov QWORD PTR -48[rbp], rsi
.loc 1 1 1
mov rax, QWORD PTR fs:40
mov QWORD PTR -8[rbp], rax
xor eax, eax
.loc 1 1 2
mov DWORD PTR -20[rbp], 0
.loc 1 1 2
mov QWORD PTR -16[rbp], 0
.loc 1 1 2
mov rcx, QWORD PTR -48[rbp]
mov eax, DWORD PTR -36[rbp]
mov edx, 2
mov rsi, rcx
mov edi, eax
call fb_Init@PLT
.L2:
.loc 1 6 2
mov r9d, 0
mov r8d, 0
mov ecx, 1
mov edx, 32
mov esi, 600
mov edi, 800
call fb_GfxScreenRes@PLT
.loc 1 8 6
mov QWORD PTR -16[rbp], 0
.loc 1 9 157
mov rax, QWORD PTR -16[rbp]
cvtsi2sd xmm1, rax
.loc 1 9 155
movsd xmm0, QWORD PTR PI$[rip]
mulsd xmm0, xmm1
.loc 1 9 170
addsd xmm0, xmm0
.loc 1 9 133
movsd xmm1, QWORD PTR .LC0[rip]
divsd xmm0, xmm1
call sin@PLT
movapd xmm1, xmm0
.loc 1 9 195
movsd xmm0, QWORD PTR .LC1[rip]
mulsd xmm1, xmm0
.loc 1 9 207
movsd xmm0, QWORD PTR .LC2[rip]
subsd xmm0, xmm1
.loc 1 9 2
cvtsd2ss xmm4, xmm0
movss DWORD PTR -40[rbp], xmm4
.loc 1 9 60
mov rax, QWORD PTR -16[rbp]
cvtsi2sd xmm1, rax
.loc 1 9 58
movsd xmm0, QWORD PTR PI$[rip]
mulsd xmm0, xmm1
.loc 1 9 73
addsd xmm0, xmm0
.loc 1 9 36
movsd xmm1, QWORD PTR .LC0[rip]
divsd xmm0, xmm1
call cos@PLT
movapd xmm1, xmm0
.loc 1 9 98
movsd xmm0, QWORD PTR .LC1[rip]
mulsd xmm1, xmm0
.loc 1 9 110
movsd xmm0, QWORD PTR .LC3[rip]
addsd xmm0, xmm1
.loc 1 9 2
cvtsd2ss xmm0, xmm0
mov ecx, 0
mov edx, -2147483644
mov esi, 0
movss xmm1, DWORD PTR -40[rbp]
mov edi, 0
call fb_GfxPset@PLT
.loc 1 10 7
mov QWORD PTR -16[rbp], 0
.L3:
.loc 1 11 177
mov rax, QWORD PTR -16[rbp]
cvtsi2sd xmm1, rax
.loc 1 11 175
movsd xmm0, QWORD PTR PI$[rip]
mulsd xmm1, xmm0
.loc 1 11 190
movsd xmm0, QWORD PTR .LC4[rip]
mulsd xmm0, xmm1
.loc 1 11 153
movsd xmm1, QWORD PTR .LC0[rip]
divsd xmm0, xmm1
call sin@PLT
movapd xmm1, xmm0
.loc 1 11 215
movsd xmm0, QWORD PTR .LC1[rip]
mulsd xmm1, xmm0
.loc 1 11 227
movsd xmm0, QWORD PTR .LC2[rip]
subsd xmm0, xmm1
.loc 1 11 4
cvtsd2ss xmm5, xmm0
movss DWORD PTR -40[rbp], xmm5
.loc 1 11 80
mov rax, QWORD PTR -16[rbp]
cvtsi2sd xmm1, rax
.loc 1 11 78
movsd xmm0, QWORD PTR PI$[rip]
mulsd xmm1, xmm0
.loc 1 11 93
movsd xmm0, QWORD PTR .LC4[rip]
mulsd xmm0, xmm1
.loc 1 11 56
movsd xmm1, QWORD PTR .LC0[rip]
divsd xmm0, xmm1
call cos@PLT
movapd xmm1, xmm0
.loc 1 11 118
movsd xmm0, QWORD PTR .LC1[rip]
mulsd xmm1, xmm0
.loc 1 11 130
movsd xmm0, QWORD PTR .LC3[rip]
addsd xmm0, xmm1
.loc 1 11 4
cvtsd2ss xmm0, xmm0
mov r8d, -2147483646
mov ecx, 65535
mov edx, 0
mov esi, 0
movss xmm3, DWORD PTR -40[rbp]
movaps xmm2, xmm0
pxor xmm1, xmm1
pxor xmm0, xmm0
mov edi, 0
call fb_GfxLine@PLT
.L4:
.loc 1 12 13
mov rax, QWORD PTR -16[rbp]
add rax, 1
.loc 1 12 7
mov QWORD PTR -16[rbp], rax
.L5:
.loc 1 12 11
mov rax, QWORD PTR -16[rbp]
.loc 1 12 5
cmp rax, 5
jg .L10
.loc 1 12 20 discriminator 2
jmp .L3
.L10:
.loc 1 12 3
nop
.L6:
.loc 1 14 2
mov edi, -1
call fb_Sleep@PLT
.loc 1 15 2
mov edi, 0
call fb_End@PLT
.L7:
.loc 1 15 2
mov edi, 0
call fb_End@PLT
.loc 1 15 9
mov eax, DWORD PTR -20[rbp]
.loc 1 15 1
mov rdx, QWORD PTR -8[rbp]
xor rdx, QWORD PTR fs:40
je .L9
call __stack_chk_fail@PLT
.L9:
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc

It can also output a C source file, it is a sort of 'low level C' that allows freebasic to be multiplatform, though im not an expert on freebasic internals, quite interesting though
Code: [Select]
typedef   signed char       int8;
typedef unsigned char      uint8;
typedef   signed short      int16;
typedef unsigned short     uint16;
typedef   signed int        int32;
typedef unsigned int       uint32;
typedef   signed long long  int64;
typedef unsigned long long uint64;
typedef struct { char *data; int64 len; int64 size; } FBSTRING;
typedef int8 boolean;
#line 15 "star.bas"
void fb_GfxPset( void*, float, float, uint32, int32, int32 );
#line 15 "star.bas"
void fb_GfxLine( void*, float, float, float, float, uint32, int32, uint32, int32 );
#line 15 "star.bas"
int32 fb_GfxScreenRes( int32, int32, int32, int32, int32, int32 );
#line 15 "star.bas"
void fb_Init( int32, uint8**, int32 );
#line 15 "star.bas"
void fb_End( int32 );
#line 15 "star.bas"
void fb_End( int32 );
#line 15 "star.bas"
void fb_Sleep( int32 );
#line 15 "star.bas"
static double PI$ = 0x1.921FB54442D18p+1;

#line 1 "star.bas"
int32 main( int32 __FB_ARGC__$0, char** __FB_ARGV__$0 )
#line 1 "star.bas"
{
#line 1 "star.bas"
int32 fb$result$0;
#line 1 "star.bas"
__builtin_memset( &fb$result$0, 0, 4ll );
#line 1 "star.bas"
int64 I$0;
#line 1 "star.bas"
__builtin_memset( &I$0, 0, 8ll );
#line 1 "star.bas"
fb_Init( __FB_ARGC__$0, (uint8**)__FB_ARGV__$0, 2 );
#line 1 "star.bas"
label$0:;
// #lang "fblite"
// const sw = 800
// const sh = 600
// dim shared as double pi = 2*asin(1)
// screenres sw, sh, 32
#line 6 "star.bas"
fb_GfxScreenRes( 800, 600, 32, 1, 0, 0 );
// i=0
#line 8 "star.bas"
I$0 = 0ll;
// pset (200*cos(2*pi*i/5) + sw/2, sh/2 - 200*sin(2*pi*i/5))
#line 9 "star.bas"
fb_GfxPset( (void*)0ull, (float)((__builtin_cos( (((PI$ * (double)I$0) * 0x1.p+1) / 0x1.4p+2) ) * 0x1.9p+7) + 0x1.9p+8), (float)(-(__builtin_sin( (((PI$ * (double)I$0) * 0x1.p+1) / 0x1.4p+2) ) * 0x1.9p+7) + 0x1.2Cp+8), 0u, -2147483644, 0 );
// for i=0 to 5
{
#line 10 "star.bas"
I$0 = 0ll;
#line 10 "star.bas"
label$5:;
{
// line -(200*cos(2*pi*i*2/5) + sw/2, sh/2 - 200*sin(2*pi*i*2/5))
#line 11 "star.bas"
fb_GfxLine( (void*)0ull, 0x0p+0f, 0x0p+0f, (float)((__builtin_cos( (((PI$ * (double)I$0) * 0x1.p+2) / 0x1.4p+2) ) * 0x1.9p+7) + 0x1.9p+8), (float)(-(__builtin_sin( (((PI$ * (double)I$0) * 0x1.p+2) / 0x1.4p+2) ) * 0x1.9p+7) + 0x1.2Cp+8), 0u, 0, 65535u, -2147483646 );
// next
}
#line 12 "star.bas"
label$3:;
#line 12 "star.bas"
I$0 = I$0 + 1ll;
#line 12 "star.bas"
label$2:;
#line 12 "star.bas"
if( I$0 <= 5ll ) goto label$5;
#line 12 "star.bas"
label$4:;
}
// sleep
#line 14 "star.bas"
fb_Sleep( -1 );
// system
#line 15 "star.bas"
fb_End( 0 );
#line 15 "star.bas"
label$1:;
#line 15 "star.bas"
fb_End( 0 );
#line 15 "star.bas"
return fb$result$0;
#line 15 "star.bas"
}

For 99.9% of the use cases out there you are 100% correct. There is no longer a need for handwritten assembler. However, there are special cases where handwritten assembler is a must. I gave one example in my code example above. In chess programming for the last 12 years give or take magic bitboards in chess move generation has been untouchable for speed. It is just 7 machine language instructions for a rook or a bishop and 14 (R + B) for the queen. That would forever be unbeatable if it were not for two (three for intel) facts. The 7 instructions form a dependency chain that makes the code not dual execution pipe friendly and it has one imul instruction that is slightly more expensive even on today's processors. And the third for intel is that intel cpus have slow shift instructions. And magic uses several shifts. Now my handwritten assembler for my new bitboard approach has 20 instructions but they are split between two dependency chains that have zero cross dependencies and thus will run in both pipes as though they were only 10 instructions running in one pipe. And overall they are faster instructions with no imul and only one shift. And that is why in special cases handwritten assembler can still be superior when it counts! :)