I've played around with this just a little to see how many FPS we can actually generate with it, with QB64.
As it stands, in its original form, I get about 17-18 FPS on my machine. With a few minor tweaks, which I'll explain below, I can get it up to about 26-28 FPS, which I find to be quite a reasonable number.
Here's my little test program and changes:
'conversion of https://www.shadertoy.com/view/NllGRM
'in QB64 By Ashish for Richard
'24 May, 2021
iTime = 0
u = j / 400
v = i / 400
t
= ABS(SIN(15 * ((u
- 0.5 + 0.3 * SIN(iTime
* 2)) ^ 2 + (v
- 0.5 + 0.3 * COS(iTime
* 2)) ^ 2) ^ 0.5 - iTime
* 5)) clr.x = t: clr.y = 1 - t: clr.z = 1 - t
final = clr
t
= COS(u
* 20 + iTime
* 5) ^ 2 clr.x = 1 - t: clr.y = t: clr.z = 0
final.x = final.x + clr.x
final.y = final.y + clr.y
final.z = final.z + clr.z
PSET (j
, i
), _RGB32(final.x
* 255, final.y
* 255, final.z
* 255) fps = fps + 1
iTime = iTime + 0.05
iTime = 0
i = 0
j = 0
v = i / 400
u = j / 400
i2 = iTime * 2
i5 = iTime * 5
' t = ABS(SIN(15 * ((u - 0.5 + 0.3 * SIN(i2)) ^ 2 + (v - 0.5 + 0.3 * COS(i2)) ^ 2) ^ 0.5 - i5))
final.x = t: final.z = 1 - t
clr.x = 1 - t: clr.y = t ': clr.z = 0
final.x = final.x + clr.x
final.y = final.z + clr.y
' final.z = final.z + clr.z
_MEMPUT m
, (m.OFFSET
+ i
* 400 * 4) + j
* 4, _RGB32(final.x
* 255, final.y
* 255, final.z
* 255) AS LONG ' PSET (j, i), _RGB32(final.x * 255, final.y * 255, final.z * 255)
j = j + 1
i = i + 1
fps = fps + 1
iTime = iTime + 0.05
Contrary to what you might expect, using _MEM to put the data directly to the screen really doesn't help much in this instance. _MEMPUT only gives an increase of about 1FPS -- our bottleneck is all the math involved here and the calculations we're doing.
If you look at the code, you can see where I ripped out some unneeded assignments which were basically duplicates, or unchanging values:
clr.x = t: clr.y = 1 - t: clr.z = 1 - t
final = clr <-- This is one. Why do the math in one set of variables, just to assign them immediately after to a different set?
Instead, I swapped this over to a simple:
final.x = t: final.z = 1 - t
(Also if you look above, you'll see I ripped the duplicate math out for final.z -- It's the same value as final.y in this case. No need for 2 variables to calculate and hold the values here.)
': clr.z = 0 <-- This I ripped out. The value doesn't change from 0, so we don't need to reassign it over and over to a variable.
' final.z = final.z + clr.z <-- This I ripped out as well. clr.z is 0, so we really don't need to add a constant 0 to final.z after we calculate it the first time.
If you look, you'll also see where I've simplified a little math down to singular operations:
i2 = iTime * 2
i5 = iTime * 5
Usually, I find it faster to calculate a value once and store it in a variable, than I do to calculate the value multiple times. If you look, you'll see that i2 is used twice, as is i5. It just seems faster to me to do the math once and store, rather than calculate it multiple times here, but it honestly makes little difference since we're only dealing with 2 instances of both per cycle....
Another bit of math which I moved was where we calculate v at. Since its value is wholly dependent on i, I've moved it to the inside of the i loop and outside of the j loop. Calculate it once, rather than 400+ times...
All of these little math changes add up to a difference of about 2-3 FPS overall.
Which probably has you guys who can count saying, "Wait a moment! We went from 16 to 28, and you've only pinpointed a change of about 4 FPS. Where's the rest of the speed boost??"
The rest comes from this single change, which is by far the most important in altering performance:
From:
t = ABS(SIN(15 * ((u - 0.5 + 0.3 * SIN(i2)) ^ 2 + (v - 0.5 + 0.3 * COS(i2)) ^ 2) ^ 0.5 - i5))
To:
t = ABS(SIN(15 * _HYPOT(u - 0.5 + 0.3 * SIN(i2), v - 0.5 + 0.3 * COS(i2)) - i5))
In this case, the _HYPOT statement is quite a bit faster than manually trying to calculate your formula: ((u - 0.5 + 0.3 * SIN(i2)) ^ 2 + (v - 0.5 + 0.3 * COS(i2)) ^ 2) ^ 0.5
A single change to use _HYPOT ups my FPS from about 20 to 27, plus or minus a point.
As I keep looking over this, there's probably even more that we can do to optimize the math even better.
Since v, i2, and i5 don't change values except inside the i loop, we can probably safely move the i2 and i5 calculation outside the loop, as well as precalculate the y value of our main math formula there: v - 0.5 + 0.3 * COS(i2)) - i5)
The more math you can move outside a loop, the faster that loop is going to become. It's a rather simple concept really, which people have gotten away from with the speed of modern computers -- doing something once is much faster than doing the same thing 400 times.. ;)