But IF Luke doesn't know THEN we are in deep do-do!
I just barely made it out of deep do-do a couple of hours ago. I heard water running and after checking the inside of the house for a broken pipe and not finding one I went outside to check the faucet. I never made it to the faucet. The snow was too deep and I fell down and could not get back on my feet. I had to crawl through the snow to get to the driveway and I barely made it. I was almost too weak to stand but somehow made it to my feet. I was extreme hyperventilating and my heart was pounding furiously in my chest. I thought that I was going to pass out. But, somehow I made it into the house and to my bed where I stayed for probably over an hour. Then I got up and went down into the basement to turn off the water. Then of course I sat down in my chair, turned on my computer and visited one of my favorite internet sites, this one! One must have priorities. :)
Concerning the OP and the question of pure interpreter, compiled to bytecode or compiled to native cpu code either jit or to exe before there was never any question in my mind. QB64 executes way to fast to be any type of interpreter. The only question for me is who can write faster C++ code, QB64 or myself. That is complicated by the fact that if the basic code is not fast then the C++ code is also most likely not going to be fast.
I personally do not trust compilers and with good reason. One optimized C routine I wrote took iirc about 23 machine language instructions. I coded it using only 6 machine instructions. Just last week I wrote a function in C++ that looked fast but it confused the bejeebers out of all tested compilers except one. The current experimental Clang compiler was able to handle it quite well.
The original C++ function is a new way of computing 64 bit bitboards for move generation in chess. The name is Split Index Super Set Yielding bitboards. Or for short, SISSY bitboards.
' The C++ function
h->moves[id] = qss[fs][occ.b08.rank1][0]
& qss[fs][occ.b08.rank2][1]
& qss[fs][occ.b08.rank3][2]
& qss[fs][occ.b08.rank4][3]
& qss[fs][occ.b08.rank5][4]
& qss[fs][occ.b08.rank6][5]
& qss[fs][occ.b08.rank7][6]
& qss[fs][occ.b08.rank8][7];
' Clang experimental
.Lfunc_begin0:
push rbx
.Ltmp0:
mov rdx, rsi
mov rcx, rsi
mov r11, rsi
mov r10, rsi
mov r9, rsi
mov r8, rsi
movzx ebx, sil
mov rax, rsi
.Ltmp1:
shr rax, 2
.Ltmp2:
shr rdx, 10
shr rcx, 18
shr r11, 26
shr r10, 34
movsxd rsi, edi
shl rbx, 6
shl rsi, 14
mov rax, qword ptr [rsi + rax + qss+8]
and rax
, qword ptr
[rsi
+ rbx
+ qss
] shr r9, 42
and rax
, qword ptr
[rsi
+ rdx
+ qss
+16] shr r8, 50
and rax
, qword ptr
[rsi
+ rcx
+ qss
+24] and rax
, qword ptr
[rsi
+ r11
+ qss
+32] and rax
, qword ptr
[rsi
+ r10
+ qss
+40] and rax
, qword ptr
[rsi
+ r9
+ qss
+48] and rax
, qword ptr
[rsi
+ r8
+ qss
+56] pop rbx
ret
' My handwritten assembler
_DATA SEGMENT
bbs STRUCT
r1 BYTE ?
r2 BYTE ?
r3 BYTE ?
r4 BYTE ?
r5 BYTE ?
r6 BYTE ?
r7 BYTE ?
r8 BYTE ?
bbs ENDS
bbu UNION
bbs<>
b64 QWORD ?
bbu ENDS
occ bbu<>
_DATA ENDS
_TEXT SEGMENT
RayAttacks PROC
; rcx = sq
; rdx = address of rss
; r8 = occ
shl rcx, 11 ; sq * 2048
mov occ.b64, r8
add rdx, rcx
movzx r8, occ.r1
movzx r9, occ.r2
mov rax, [rdx + r8 * 8]
mov rcx, [rdx + r9 * 8 + 131072]
movzx r8, occ.r3
movzx r9, occ.r4
and rax
, [rdx
+ r8
* 8 + (2 * 131072)] and rcx
, [rdx
+ r9
* 8 + (3 * 131072)] movzx r8, occ.r5
movzx r9, occ.r6
and rax
, [rdx
+ r8
* 8 + (4 * 131072)] and rcx
, [rdx
+ r9
* 8 + (5 * 131072)] movzx r8, occ.r7
movzx r9, occ.r8
and rax
, [rdx
+ r8
* 8 + (6 * 131072)] and rcx
, [rdx
+ r9
* 8 + (7 * 131072)] ret
RayAttacks ENDP
TEXT ENDS
I'd put my hand written assembler up against Clang any day of the week.
This was just in case there is anyone that would be interested in the workings behind the scene.