Print Page - How is a QB64 exe file made?

Active Forums => QB64 Discussion => Topic started by: bplus on February 23, 2020, 10:22:15 am

Title: How is a QB64 exe file made?
Post by: bplus on February 23, 2020, 10:22:15 am

I read this discription from Aurel at Snytax Bomb:
https://www.syntaxbomb.com/index.php/topic,6677.msg347040096.html#msg347040096

Quote

In fact both of them are interpreters with one difference.
Qb64 compile into bytecode and then is this bytecode binded or added to qb64 runtime interpreter which form
one exe file as standalone, so looks like is compiled into machine code.

This is how SdlBasic works and maybe Just Basic because it Tokenizes file for .exe but I am under the impression QB64 is completely compiled through C+ into .exe before it can be run at all.

Title: Re: How is a QB64 exe file made?
Post by: STxAxTIC on February 23, 2020, 10:36:23 am

Aurel is talking through his nose.

Title: Re: How is a QB64 exe file made?
Post by: bplus on February 23, 2020, 10:41:31 am

Thanks, is my description accurate (for it's length)?

I don't want to misrepresent or leave something major unsaid about QB64 specially at another forum.

Title: Re: How is a QB64 exe file made?
Post by: STxAxTIC on February 23, 2020, 10:44:05 am

Being a math guy, I'd say the shortest representation of qb64 is an equation:

Code: QB64: [Select]

QB64 = BASIC + _GL => C++

Title: Re: How is a QB64 exe file made?
Post by: bplus on February 23, 2020, 10:45:49 am

Quote from: STxAxTIC on February 23, 2020, 10:44:05 am

Being a math guy, I'd say the shortest representation of qb64 is an equation:

Code: QB64: [Select]
QB64 = BASIC + _GL => C++

LOL luv it, so it's C++, two plus signs not one?

Update: Yes I just did a check on Internet C+ is a grade. :)

Title: Re: How is a QB64 exe file made?
Post by: Aurel on February 23, 2020, 11:02:15 am

Quote

Aurel is talking through his nose.

That is a nice to know ..heh.
And why that be secret or i don't know what else.
Whole Java is based on bytecode iterpreter, and also many other BASIC dialects.
So i don't see any problem with that.

Title: Re: How is a QB64 exe file made?
Post by: luke on February 23, 2020, 04:59:19 pm

There's no bytecode to be seen here: how fancy do you think we are?

QB64 just reads the source file line by line and generates corresponding C++ code directly using something not entirely unlike

Code: [Select]

IF inputelements$(1) = "DRAW" THEN
    PRINT #outfile, "sub_draw(";
    ' do any arguments
    PRINT #outfile, ");"
END IF

And continues on for each statement.

Then you just run that through g++/clang and link it with some runtime libraries, et voilà.

Title: Re: How is a QB64 exe file made?
Post by: bplus on February 23, 2020, 09:10:07 pm

(I am) Not really qualified to judge best answer but if Luke doesn't know...

Title: Re: How is a QB64 exe file made?
Post by: FellippeHeitor on February 23, 2020, 09:11:33 pm

Luke just described the whole process, what doesn't he know?

Title: Re: How is a QB64 exe file made?
Post by: bplus on February 23, 2020, 09:15:36 pm

But IF Luke doesn't know THEN we are in deep do-do!

Title: Re: How is a QB64 exe file made?
Post by: FellippeHeitor on February 23, 2020, 09:21:27 pm

Aaaah.

Title: Re: How is a QB64 exe file made?
Post by: Pete on February 24, 2020, 02:50:03 am

And all the while I thought it was done with unicorns and pixie dust. Son of a batch! Oh well. I guess if you're looking for a horny horse left in the dust, you need FreeBASIC for that.

Pete

Title: Re: How is a QB64 exe file made?
Post by: romichess on February 24, 2020, 03:53:03 am

Quote from: bplus on February 23, 2020, 09:15:36 pm

But IF Luke doesn't know THEN we are in deep do-do!

I just barely made it out of deep do-do a couple of hours ago. I heard water running and after checking the inside of the house for a broken pipe and not finding one I went outside to check the faucet. I never made it to the faucet. The snow was too deep and I fell down and could not get back on my feet. I had to crawl through the snow to get to the driveway and I barely made it. I was almost too weak to stand but somehow made it to my feet. I was extreme hyperventilating and my heart was pounding furiously in my chest. I thought that I was going to pass out. But, somehow I made it into the house and to my bed where I stayed for probably over an hour. Then I got up and went down into the basement to turn off the water. Then of course I sat down in my chair, turned on my computer and visited one of my favorite internet sites, this one! One must have priorities. :)

Concerning the OP and the question of pure interpreter, compiled to bytecode or compiled to native cpu code either jit or to exe before there was never any question in my mind. QB64 executes way to fast to be any type of interpreter. The only question for me is who can write faster C++ code, QB64 or myself. That is complicated by the fact that if the basic code is not fast then the C++ code is also most likely not going to be fast.

I personally do not trust compilers and with good reason. One optimized C routine I wrote took iirc about 23 machine language instructions. I coded it using only 6 machine instructions. Just last week I wrote a function in C++ that looked fast but it confused the bejeebers out of all tested compilers except one. The current experimental Clang compiler was able to handle it quite well.

The original C++ function is a new way of computing 64 bit bitboards for move generation in chess. The name is Split Index Super Set Yielding bitboards. Or for short, SISSY bitboards.

Code: QB64: [Select]

' The C++ function
case WQUEEN:
        h->moves[id] = qss[fs][occ.b08.rank1][0] 
                     & qss[fs][occ.b08.rank2][1]
                     & qss[fs][occ.b08.rank3][2]
                     & qss[fs][occ.b08.rank4][3]
                     & qss[fs][occ.b08.rank5][4]
                     & qss[fs][occ.b08.rank6][5]
                     & qss[fs][occ.b08.rank7][6]
                     & qss[fs][occ.b08.rank8][7];
 
' Clang experimental
queenAttacks(int, unsigned long long):                     
.Lfunc_begin0:
        push    rbx
.Ltmp0:
        mov     rdx, rsi
        mov     rcx, rsi
        mov     r11, rsi
        mov     r10, rsi
        mov     r9, rsi
        mov     r8, rsi
        movzx   ebx, sil
        mov     rax, rsi
.Ltmp1:
        shr     rax, 2
.Ltmp2:
        shr     rdx, 10
        shr     rcx, 18
        shr     r11, 26
        shr     r10, 34
        movsxd  rsi, edi
        shl     rbx, 6
        shl     rsi, 14
        and     eax, 16320
        mov     rax, qword ptr [rsi + rax + qss+8]
        and     rax, qword ptr [rsi + rbx + qss]
        shr     r9, 42
        and     edx, 16320
        and     rax, qword ptr [rsi + rdx + qss+16]
        shr     r8, 50
        and     ecx, 16320
        and     rax, qword ptr [rsi + rcx + qss+24]
        and     r8d, -64
        and     r11d, 16320
        and     rax, qword ptr [rsi + r11 + qss+32]
        and     r10d, 16320
        and     rax, qword ptr [rsi + r10 + qss+40]
        and     r9d, 16320
        and     rax, qword ptr [rsi + r9 + qss+48]
        and     rax, qword ptr [rsi + r8 + qss+56]
        pop     rbx
        ret
 
' My handwritten assembler
 
_DATA SEGMENT
 
bbs STRUCT
r1 BYTE ?
r2 BYTE ?
r3 BYTE ?
r4 BYTE ?
r5 BYTE ?
r6 BYTE ?
r7 BYTE ?
r8 BYTE ?
bbs ENDS
 
bbu UNION
bbs<>
b64 QWORD ?
bbu ENDS
 
occ bbu<>
 
_DATA ENDS
 
_TEXT SEGMENT
 
RayAttacks PROC
 
; rcx = sq
; rdx = address of rss
; r8 = occ
 
shl rcx, 11 ; sq * 2048
mov occ.b64, r8
add rdx, rcx
movzx r8, occ.r1
movzx r9, occ.r2
mov rax, [rdx + r8 * 8]
mov rcx, [rdx + r9 * 8 + 131072]
movzx r8, occ.r3
movzx r9, occ.r4
and rax, [rdx + r8 * 8 + (2 * 131072)]
and rcx, [rdx + r9 * 8 + (3 * 131072)]
movzx r8, occ.r5
movzx r9, occ.r6
and rax, [rdx + r8 * 8 + (4 * 131072)]
and rcx, [rdx + r9 * 8 + (5 * 131072)]
movzx r8, occ.r7
movzx r9, occ.r8
and rax, [rdx + r8 * 8 + (6 * 131072)]
and rcx, [rdx + r9 * 8 + (7 * 131072)]
and rax, rcx
ret
 
RayAttacks ENDP
 
TEXT ENDS
 
END
 
 

I'd put my hand written assembler up against Clang any day of the week.

This was just in case there is anyone that would be interested in the workings behind the scene.

Title: Re: How is a QB64 exe file made?
Post by: bplus on February 24, 2020, 12:59:49 pm

Quote

Then of course I sat down in my chair, turned on my computer and visited one of my favorite internet sites, this one! One must have priorities. :)

LOL thank goodness we can laugh about this now.

hmm... love Basic and assembler, Basic + assembler...

@romichess
You might be interested in MasmBasic developed by jj2007 (I met at Retro (RIP) and member at FreeBasic forum)
https://retrobasic.allbasic.info/index.php?topic=358.msg2397#msg2397
links still work.

Title: Re: How is a QB64 exe file made?
Post by: romichess on February 24, 2020, 03:58:45 pm

Quote from: bplus on February 24, 2020, 12:59:49 pm

LOL thank goodness we can laugh about this now.

hmm... love Basic and assembler, Basic + assembler...

@romichess
You might be interested in MasmBasic developed by jj2007 (I met at Retro (RIP) and member at FreeBasic forum)
https://retrobasic.allbasic.info/index.php?topic=358.msg2397#msg2397
links still work.

Just waiting on the plumber to show up. I like the idea of a basic that outputs assembler. That way after having a working program I could rewrite sections one at a time for better performance. But for me and my chess programming goals only 64 bit assembler will do. Thanks for the link though!

Title: Re: How is a QB64 exe file made?
Post by: _vince on February 24, 2020, 08:18:05 pm

Quote from: romichess on February 24, 2020, 03:58:45 pm

I like the idea of a basic that outputs assembler. That way after having a working program I could rewrite sections one at a time for better performance.

freebasic can output the full asm file right before it gets assembled, but it also has inline assembly support so you'd never care to modify it anyway (Not that you should ever try to hand optimize modern asm)

Here's an excerpt:

Code: [Select]

const sw = 800
const sh = 600
dim shared as double pi = 2*asin(1)

screenres sw, sh, 32

i=0
pset (200*cos(2*pi*i/5) + sw/2, sh/2 - 200*sin(2*pi*i/5))
for i=0 to 5
	line -(200*cos(2*pi*i*2/5) + sw/2, sh/2 - 200*sin(2*pi*i*2/5))
next

sleep
system

excerpt from the .asm file

Code: [Select]

main:
.LFB0:
	.file 1 "star.bas"
	.loc 1 1 1
	.cfi_startproc
	push	rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	mov	rbp, rsp
	.cfi_def_cfa_register 6
	sub	rsp, 48
	mov	DWORD PTR -36[rbp], edi
	mov	QWORD PTR -48[rbp], rsi
	.loc 1 1 1
	mov	rax, QWORD PTR fs:40
	mov	QWORD PTR -8[rbp], rax
	xor	eax, eax
	.loc 1 1 2
	mov	DWORD PTR -20[rbp], 0
	.loc 1 1 2
	mov	QWORD PTR -16[rbp], 0
	.loc 1 1 2
	mov	rcx, QWORD PTR -48[rbp]
	mov	eax, DWORD PTR -36[rbp]
	mov	edx, 2
	mov	rsi, rcx
	mov	edi, eax
	call	fb_Init@PLT
.L2:
	.loc 1 6 2
	mov	r9d, 0
	mov	r8d, 0
	mov	ecx, 1
	mov	edx, 32
	mov	esi, 600
	mov	edi, 800
	call	fb_GfxScreenRes@PLT
	.loc 1 8 6
	mov	QWORD PTR -16[rbp], 0
	.loc 1 9 157
	mov	rax, QWORD PTR -16[rbp]
	cvtsi2sd	xmm1, rax
	.loc 1 9 155
	movsd	xmm0, QWORD PTR PI$[rip]
	mulsd	xmm0, xmm1
	.loc 1 9 170
	addsd	xmm0, xmm0
	.loc 1 9 133
	movsd	xmm1, QWORD PTR .LC0[rip]
	divsd	xmm0, xmm1
	call	sin@PLT
	movapd	xmm1, xmm0
	.loc 1 9 195
	movsd	xmm0, QWORD PTR .LC1[rip]
	mulsd	xmm1, xmm0
	.loc 1 9 207
	movsd	xmm0, QWORD PTR .LC2[rip]
	subsd	xmm0, xmm1
	.loc 1 9 2
	cvtsd2ss	xmm4, xmm0
	movss	DWORD PTR -40[rbp], xmm4
	.loc 1 9 60
	mov	rax, QWORD PTR -16[rbp]
	cvtsi2sd	xmm1, rax
	.loc 1 9 58
	movsd	xmm0, QWORD PTR PI$[rip]
	mulsd	xmm0, xmm1
	.loc 1 9 73
	addsd	xmm0, xmm0
	.loc 1 9 36
	movsd	xmm1, QWORD PTR .LC0[rip]
	divsd	xmm0, xmm1
	call	cos@PLT
	movapd	xmm1, xmm0
	.loc 1 9 98
	movsd	xmm0, QWORD PTR .LC1[rip]
	mulsd	xmm1, xmm0
	.loc 1 9 110
	movsd	xmm0, QWORD PTR .LC3[rip]
	addsd	xmm0, xmm1
	.loc 1 9 2
	cvtsd2ss	xmm0, xmm0
	mov	ecx, 0
	mov	edx, -2147483644
	mov	esi, 0
	movss	xmm1, DWORD PTR -40[rbp]
	mov	edi, 0
	call	fb_GfxPset@PLT
	.loc 1 10 7
	mov	QWORD PTR -16[rbp], 0
.L3:
	.loc 1 11 177
	mov	rax, QWORD PTR -16[rbp]
	cvtsi2sd	xmm1, rax
	.loc 1 11 175
	movsd	xmm0, QWORD PTR PI$[rip]
	mulsd	xmm1, xmm0
	.loc 1 11 190
	movsd	xmm0, QWORD PTR .LC4[rip]
	mulsd	xmm0, xmm1
	.loc 1 11 153
	movsd	xmm1, QWORD PTR .LC0[rip]
	divsd	xmm0, xmm1
	call	sin@PLT
	movapd	xmm1, xmm0
	.loc 1 11 215
	movsd	xmm0, QWORD PTR .LC1[rip]
	mulsd	xmm1, xmm0
	.loc 1 11 227
	movsd	xmm0, QWORD PTR .LC2[rip]
	subsd	xmm0, xmm1
	.loc 1 11 4
	cvtsd2ss	xmm5, xmm0
	movss	DWORD PTR -40[rbp], xmm5
	.loc 1 11 80
	mov	rax, QWORD PTR -16[rbp]
	cvtsi2sd	xmm1, rax
	.loc 1 11 78
	movsd	xmm0, QWORD PTR PI$[rip]
	mulsd	xmm1, xmm0
	.loc 1 11 93
	movsd	xmm0, QWORD PTR .LC4[rip]
	mulsd	xmm0, xmm1
	.loc 1 11 56
	movsd	xmm1, QWORD PTR .LC0[rip]
	divsd	xmm0, xmm1
	call	cos@PLT
	movapd	xmm1, xmm0
	.loc 1 11 118
	movsd	xmm0, QWORD PTR .LC1[rip]
	mulsd	xmm1, xmm0
	.loc 1 11 130
	movsd	xmm0, QWORD PTR .LC3[rip]
	addsd	xmm0, xmm1
	.loc 1 11 4
	cvtsd2ss	xmm0, xmm0
	mov	r8d, -2147483646
	mov	ecx, 65535
	mov	edx, 0
	mov	esi, 0
	movss	xmm3, DWORD PTR -40[rbp]
	movaps	xmm2, xmm0
	pxor	xmm1, xmm1
	pxor	xmm0, xmm0
	mov	edi, 0
	call	fb_GfxLine@PLT
.L4:
	.loc 1 12 13
	mov	rax, QWORD PTR -16[rbp]
	add	rax, 1
	.loc 1 12 7
	mov	QWORD PTR -16[rbp], rax
.L5:
	.loc 1 12 11
	mov	rax, QWORD PTR -16[rbp]
	.loc 1 12 5
	cmp	rax, 5
	jg	.L10
	.loc 1 12 20 discriminator 2
	jmp	.L3
.L10:
	.loc 1 12 3
	nop
.L6:
	.loc 1 14 2
	mov	edi, -1
	call	fb_Sleep@PLT
	.loc 1 15 2
	mov	edi, 0
	call	fb_End@PLT
.L7:
	.loc 1 15 2
	mov	edi, 0
	call	fb_End@PLT
	.loc 1 15 9
	mov	eax, DWORD PTR -20[rbp]
	.loc 1 15 1
	mov	rdx, QWORD PTR -8[rbp]
	xor	rdx, QWORD PTR fs:40
	je	.L9
	call	__stack_chk_fail@PLT
.L9:
	leave
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc

It can also output a C source file, it is a sort of 'low level C' that allows freebasic to be multiplatform, though im not an expert on freebasic internals, quite interesting though

Code: [Select]

typedef   signed char       int8;
typedef unsigned char      uint8;
typedef   signed short      int16;
typedef unsigned short     uint16;
typedef   signed int        int32;
typedef unsigned int       uint32;
typedef   signed long long  int64;
typedef unsigned long long uint64;
typedef struct { char *data; int64 len; int64 size; } FBSTRING;
typedef int8 boolean;
#line 15 "star.bas"
void fb_GfxPset( void*, float, float, uint32, int32, int32 );
#line 15 "star.bas"
void fb_GfxLine( void*, float, float, float, float, uint32, int32, uint32, int32 );
#line 15 "star.bas"
int32 fb_GfxScreenRes( int32, int32, int32, int32, int32, int32 );
#line 15 "star.bas"
void fb_Init( int32, uint8**, int32 );
#line 15 "star.bas"
void fb_End( int32 );
#line 15 "star.bas"
void fb_End( int32 );
#line 15 "star.bas"
void fb_Sleep( int32 );
#line 15 "star.bas"
static double PI$ = 0x1.921FB54442D18p+1;

#line 1 "star.bas"
int32 main( int32 __FB_ARGC__$0, char** __FB_ARGV__$0 )
#line 1 "star.bas"
{
	#line 1 "star.bas"
	int32 fb$result$0;
	#line 1 "star.bas"
	__builtin_memset( &fb$result$0, 0, 4ll );
	#line 1 "star.bas"
	int64 I$0;
	#line 1 "star.bas"
	__builtin_memset( &I$0, 0, 8ll );
	#line 1 "star.bas"
	fb_Init( __FB_ARGC__$0, (uint8**)__FB_ARGV__$0, 2 );
	#line 1 "star.bas"
	label$0:;
	// #lang "fblite"
	// const sw = 800
	// const sh = 600
	// dim shared as double pi = 2*asin(1)
	// screenres sw, sh, 32
	#line 6 "star.bas"
	fb_GfxScreenRes( 800, 600, 32, 1, 0, 0 );
	// i=0
	#line 8 "star.bas"
	I$0 = 0ll;
	// pset (200*cos(2*pi*i/5) + sw/2, sh/2 - 200*sin(2*pi*i/5))
	#line 9 "star.bas"
	fb_GfxPset( (void*)0ull, (float)((__builtin_cos( (((PI$ * (double)I$0) * 0x1.p+1) / 0x1.4p+2) ) * 0x1.9p+7) + 0x1.9p+8), (float)(-(__builtin_sin( (((PI$ * (double)I$0) * 0x1.p+1) / 0x1.4p+2) ) * 0x1.9p+7) + 0x1.2Cp+8), 0u, -2147483644, 0 );
	// for i=0 to 5
	{
		#line 10 "star.bas"
		I$0 = 0ll;
		#line 10 "star.bas"
		label$5:;
		{
			// 	line -(200*cos(2*pi*i*2/5) + sw/2, sh/2 - 200*sin(2*pi*i*2/5))
			#line 11 "star.bas"
			fb_GfxLine( (void*)0ull, 0x0p+0f, 0x0p+0f, (float)((__builtin_cos( (((PI$ * (double)I$0) * 0x1.p+2) / 0x1.4p+2) ) * 0x1.9p+7) + 0x1.9p+8), (float)(-(__builtin_sin( (((PI$ * (double)I$0) * 0x1.p+2) / 0x1.4p+2) ) * 0x1.9p+7) + 0x1.2Cp+8), 0u, 0, 65535u, -2147483646 );
			// next
		}
		#line 12 "star.bas"
		label$3:;
		#line 12 "star.bas"
		I$0 = I$0 + 1ll;
		#line 12 "star.bas"
		label$2:;
		#line 12 "star.bas"
		if( I$0 <= 5ll ) goto label$5;
		#line 12 "star.bas"
		label$4:;
	}
	// sleep
	#line 14 "star.bas"
	fb_Sleep( -1 );
	// system
	#line 15 "star.bas"
	fb_End( 0 );
	#line 15 "star.bas"
	label$1:;
	#line 15 "star.bas"
	fb_End( 0 );
	#line 15 "star.bas"
	return fb$result$0;
#line 15 "star.bas"
}

Title: Re: How is a QB64 exe file made?
Post by: romichess on February 24, 2020, 09:26:11 pm

Quote from: _vince on February 24, 2020, 08:18:05 pm

freebasic can output the full asm file right before it gets assembled, but it also has inline assembly support so you'd never care to modify it anyway (Not that you should ever try to hand optimize modern asm)

Here's an excerpt:
Code: [Select]
const sw = 800 const sh = 600 dim shared as double pi = 2*asin(1) screenres sw, sh, 32 i=0 pset (200*cos(2*pi*i/5) + sw/2, sh/2 - 200*sin(2*pi*i/5)) for i=0 to 5 line -(200*cos(2*pi*i*2/5) + sw/2, sh/2 - 200*sin(2*pi*i*2/5)) next sleep system
excerpt from the .asm file
Code: [Select]
main: .LFB0: .file 1 "star.bas" .loc 1 1 1 .cfi_startproc push rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 mov rbp, rsp .cfi_def_cfa_register 6 sub rsp, 48 mov DWORD PTR -36[rbp], edi mov QWORD PTR -48[rbp], rsi .loc 1 1 1 mov rax, QWORD PTR fs:40 mov QWORD PTR -8[rbp], rax xor eax, eax .loc 1 1 2 mov DWORD PTR -20[rbp], 0 .loc 1 1 2 mov QWORD PTR -16[rbp], 0 .loc 1 1 2 mov rcx, QWORD PTR -48[rbp] mov eax, DWORD PTR -36[rbp] mov edx, 2 mov rsi, rcx mov edi, eax call fb_Init@PLT .L2: .loc 1 6 2 mov r9d, 0 mov r8d, 0 mov ecx, 1 mov edx, 32 mov esi, 600 mov edi, 800 call fb_GfxScreenRes@PLT .loc 1 8 6 mov QWORD PTR -16[rbp], 0 .loc 1 9 157 mov rax, QWORD PTR -16[rbp] cvtsi2sd xmm1, rax .loc 1 9 155 movsd xmm0, QWORD PTR PI$[rip] mulsd xmm0, xmm1 .loc 1 9 170 addsd xmm0, xmm0 .loc 1 9 133 movsd xmm1, QWORD PTR .LC0[rip] divsd xmm0, xmm1 call sin@PLT movapd xmm1, xmm0 .loc 1 9 195 movsd xmm0, QWORD PTR .LC1[rip] mulsd xmm1, xmm0 .loc 1 9 207 movsd xmm0, QWORD PTR .LC2[rip] subsd xmm0, xmm1 .loc 1 9 2 cvtsd2ss xmm4, xmm0 movss DWORD PTR -40[rbp], xmm4 .loc 1 9 60 mov rax, QWORD PTR -16[rbp] cvtsi2sd xmm1, rax .loc 1 9 58 movsd xmm0, QWORD PTR PI$[rip] mulsd xmm0, xmm1 .loc 1 9 73 addsd xmm0, xmm0 .loc 1 9 36 movsd xmm1, QWORD PTR .LC0[rip] divsd xmm0, xmm1 call cos@PLT movapd xmm1, xmm0 .loc 1 9 98 movsd xmm0, QWORD PTR .LC1[rip] mulsd xmm1, xmm0 .loc 1 9 110 movsd xmm0, QWORD PTR .LC3[rip] addsd xmm0, xmm1 .loc 1 9 2 cvtsd2ss xmm0, xmm0 mov ecx, 0 mov edx, -2147483644 mov esi, 0 movss xmm1, DWORD PTR -40[rbp] mov edi, 0 call fb_GfxPset@PLT .loc 1 10 7 mov QWORD PTR -16[rbp], 0 .L3: .loc 1 11 177 mov rax, QWORD PTR -16[rbp] cvtsi2sd xmm1, rax .loc 1 11 175 movsd xmm0, QWORD PTR PI$[rip] mulsd xmm1, xmm0 .loc 1 11 190 movsd xmm0, QWORD PTR .LC4[rip] mulsd xmm0, xmm1 .loc 1 11 153 movsd xmm1, QWORD PTR .LC0[rip] divsd xmm0, xmm1 call sin@PLT movapd xmm1, xmm0 .loc 1 11 215 movsd xmm0, QWORD PTR .LC1[rip] mulsd xmm1, xmm0 .loc 1 11 227 movsd xmm0, QWORD PTR .LC2[rip] subsd xmm0, xmm1 .loc 1 11 4 cvtsd2ss xmm5, xmm0 movss DWORD PTR -40[rbp], xmm5 .loc 1 11 80 mov rax, QWORD PTR -16[rbp] cvtsi2sd xmm1, rax .loc 1 11 78 movsd xmm0, QWORD PTR PI$[rip] mulsd xmm1, xmm0 .loc 1 11 93 movsd xmm0, QWORD PTR .LC4[rip] mulsd xmm0, xmm1 .loc 1 11 56 movsd xmm1, QWORD PTR .LC0[rip] divsd xmm0, xmm1 call cos@PLT movapd xmm1, xmm0 .loc 1 11 118 movsd xmm0, QWORD PTR .LC1[rip] mulsd xmm1, xmm0 .loc 1 11 130 movsd xmm0, QWORD PTR .LC3[rip] addsd xmm0, xmm1 .loc 1 11 4 cvtsd2ss xmm0, xmm0 mov r8d, -2147483646 mov ecx, 65535 mov edx, 0 mov esi, 0 movss xmm3, DWORD PTR -40[rbp] movaps xmm2, xmm0 pxor xmm1, xmm1 pxor xmm0, xmm0 mov edi, 0 call fb_GfxLine@PLT .L4: .loc 1 12 13 mov rax, QWORD PTR -16[rbp] add rax, 1 .loc 1 12 7 mov QWORD PTR -16[rbp], rax .L5: .loc 1 12 11 mov rax, QWORD PTR -16[rbp] .loc 1 12 5 cmp rax, 5 jg .L10 .loc 1 12 20 discriminator 2 jmp .L3 .L10: .loc 1 12 3 nop .L6: .loc 1 14 2 mov edi, -1 call fb_Sleep@PLT .loc 1 15 2 mov edi, 0 call fb_End@PLT .L7: .loc 1 15 2 mov edi, 0 call fb_End@PLT .loc 1 15 9 mov eax, DWORD PTR -20[rbp] .loc 1 15 1 mov rdx, QWORD PTR -8[rbp] xor rdx, QWORD PTR fs:40 je .L9 call __stack_chk_fail@PLT .L9: leave .cfi_def_cfa 7, 8 ret .cfi_endproc
It can also output a C source file, it is a sort of 'low level C' that allows freebasic to be multiplatform, though im not an expert on freebasic internals, quite interesting though
Code: [Select]
typedef signed char int8; typedef unsigned char uint8; typedef signed short int16; typedef unsigned short uint16; typedef signed int int32; typedef unsigned int uint32; typedef signed long long int64; typedef unsigned long long uint64; typedef struct { char *data; int64 len; int64 size; } FBSTRING; typedef int8 boolean; #line 15 "star.bas" void fb_GfxPset( void*, float, float, uint32, int32, int32 ); #line 15 "star.bas" void fb_GfxLine( void*, float, float, float, float, uint32, int32, uint32, int32 ); #line 15 "star.bas" int32 fb_GfxScreenRes( int32, int32, int32, int32, int32, int32 ); #line 15 "star.bas" void fb_Init( int32, uint8**, int32 ); #line 15 "star.bas" void fb_End( int32 ); #line 15 "star.bas" void fb_End( int32 ); #line 15 "star.bas" void fb_Sleep( int32 ); #line 15 "star.bas" static double PI$ = 0x1.921FB54442D18p+1; #line 1 "star.bas" int32 main( int32 __FB_ARGC__$0, char** __FB_ARGV__$0 ) #line 1 "star.bas" { #line 1 "star.bas" int32 fb$result$0; #line 1 "star.bas" __builtin_memset( &fb$result$0, 0, 4ll ); #line 1 "star.bas" int64 I$0; #line 1 "star.bas" __builtin_memset( &I$0, 0, 8ll ); #line 1 "star.bas" fb_Init( __FB_ARGC__$0, (uint8**)__FB_ARGV__$0, 2 ); #line 1 "star.bas" label$0:; // #lang "fblite" // const sw = 800 // const sh = 600 // dim shared as double pi = 2*asin(1) // screenres sw, sh, 32 #line 6 "star.bas" fb_GfxScreenRes( 800, 600, 32, 1, 0, 0 ); // i=0 #line 8 "star.bas" I$0 = 0ll; // pset (200*cos(2*pi*i/5) + sw/2, sh/2 - 200*sin(2*pi*i/5)) #line 9 "star.bas" fb_GfxPset( (void*)0ull, (float)((__builtin_cos( (((PI$ * (double)I$0) * 0x1.p+1) / 0x1.4p+2) ) * 0x1.9p+7) + 0x1.9p+8), (float)(-(__builtin_sin( (((PI$ * (double)I$0) * 0x1.p+1) / 0x1.4p+2) ) * 0x1.9p+7) + 0x1.2Cp+8), 0u, -2147483644, 0 ); // for i=0 to 5 { #line 10 "star.bas" I$0 = 0ll; #line 10 "star.bas" label$5:; { // line -(200*cos(2*pi*i*2/5) + sw/2, sh/2 - 200*sin(2*pi*i*2/5)) #line 11 "star.bas" fb_GfxLine( (void*)0ull, 0x0p+0f, 0x0p+0f, (float)((__builtin_cos( (((PI$ * (double)I$0) * 0x1.p+2) / 0x1.4p+2) ) * 0x1.9p+7) + 0x1.9p+8), (float)(-(__builtin_sin( (((PI$ * (double)I$0) * 0x1.p+2) / 0x1.4p+2) ) * 0x1.9p+7) + 0x1.2Cp+8), 0u, 0, 65535u, -2147483646 ); // next } #line 12 "star.bas" label$3:; #line 12 "star.bas" I$0 = I$0 + 1ll; #line 12 "star.bas" label$2:; #line 12 "star.bas" if( I$0 <= 5ll ) goto label$5; #line 12 "star.bas" label$4:; } // sleep #line 14 "star.bas" fb_Sleep( -1 ); // system #line 15 "star.bas" fb_End( 0 ); #line 15 "star.bas" label$1:; #line 15 "star.bas" fb_End( 0 ); #line 15 "star.bas" return fb$result$0; #line 15 "star.bas" }

For 99.9% of the use cases out there you are 100% correct. There is no longer a need for handwritten assembler. However, there are special cases where handwritten assembler is a must. I gave one example in my code example above. In chess programming for the last 12 years give or take magic bitboards in chess move generation has been untouchable for speed. It is just 7 machine language instructions for a rook or a bishop and 14 (R + B) for the queen. That would forever be unbeatable if it were not for two (three for intel) facts. The 7 instructions form a dependency chain that makes the code not dual execution pipe friendly and it has one imul instruction that is slightly more expensive even on today's processors. And the third for intel is that intel cpus have slow shift instructions. And magic uses several shifts. Now my handwritten assembler for my new bitboard approach has 20 instructions but they are split between two dependency chains that have zero cross dependencies and thus will run in both pipes as though they were only 10 instructions running in one pipe. And overall they are faster instructions with no imul and only one shift. And that is why in special cases handwritten assembler can still be superior when it counts! :)

Text Only | Text with Attachments

QB64.org Forum

Active Forums => QB64 Discussion => Topic started by: bplus on February 23, 2020, 10:22:15 am