Author Topic: Old Skool Plasma Effect  (Read 5420 times)

0 Members and 1 Guest are viewing this topic.

Offline Ashish

  • Forum Resident
  • Posts: 630
  • Never Give Up!
    • View Profile
Old Skool Plasma Effect
« on: May 24, 2021, 08:06:51 am »
Hey guys! Currently, I'm learning GLSL and I recently created shader of plasma
effect - https://www.shadertoy.com/view/NllGRM which ChiaPet asked me convert it to QB64 at Discord
So, I converted it to QB64. :)

Code: QB64: [Select]
  1. 'conversion of https://www.shadertoy.com/view/NllGRM
  2. 'in QB64 By Ashish for Richard
  3. '24 May, 2021
  4. _Title "PLASMA EFFECT"
  5. Screen _NewImage(400, 400, 32)
  6. Type vec3
  7.     As Single x, y, z
  8. Dim clr As vec3, final As vec3
  9. iTime = 0
  10.     For i = 0 To 400
  11.         For j = 0 To 400
  12.             u = j / 400
  13.             v = i / 400
  14.             t = Abs(Sin(15 * ((u - 0.5 + 0.3 * Sin(iTime * 2)) ^ 2 + (v - 0.5 + 0.3 * Cos(iTime * 2)) ^ 2) ^ 0.5 - iTime * 5))
  15.             clr.x = t: clr.y = 1 - t: clr.z = 1 - t
  16.             final = clr
  17.             t = Cos(u * 20 + iTime * 5) ^ 2
  18.             clr.x = 1 - t: clr.y = t: clr.z = 0
  19.             final.x = final.x + clr.x
  20.             final.y = final.y + clr.y
  21.             final.z = final.z + clr.z
  22.             PSet (j, i), _RGB32(final.x * 255, final.y * 255, final.z * 255)
  23.         Next j
  24.     Next i
  25.     _Display
  26.     iTime = iTime + 0.05
  27.  
« Last Edit: May 25, 2021, 07:41:30 am by Ashish »
if (Me.success) {Me.improve()} else {Me.tryAgain()}


My Projects - https://github.com/AshishKingdom?tab=repositories
OpenGL tutorials - https://ashishkingdom.github.io/OpenGL-Tutorials

Offline Dav

  • Forum Resident
  • Posts: 792
    • View Profile
Re: Old Skool Plasma Effect
« Reply #1 on: May 24, 2021, 09:28:33 am »
Nice one!  Was thinking it would be cool to use a plasma effect in a sound visualizer somehow, like the sound 2d visualizer thing you posted earlier.   

- Dav

Offline Richard Frost

  • Seasoned Forum Regular
  • Posts: 316
  • Needle nardle noo. - Peter Sellers
    • View Profile
Re: Old Skool Plasma Effect
« Reply #2 on: May 24, 2021, 10:00:38 am »
It's lovely!   I thought it'd take a week to translate that.  Probably doesn't sleep much.
At about 4 frames per second in QB64, it's easy to see why WLSL (or whatever it is)
is being used.  Wouldn't make a very good music visualizer with speeds like that.  Such
is the price of trig and powers.

It works better if you plug it in.

Offline bplus

  • Global Moderator
  • Forum Resident
  • Posts: 8053
  • b = b + ...
    • View Profile
Re: Old Skool Plasma Effect
« Reply #3 on: May 24, 2021, 01:07:29 pm »
I had never heard of GLSL before, interesting.

FellippeHeitor

  • Guest
Re: Old Skool Plasma Effect
« Reply #4 on: May 24, 2021, 01:17:07 pm »
It is slower indeed (definitely not 4 fps... I believe Chia's machine needs some plasma transfusion at this point), but the fact it's done entirely without resorting to OpenGL in QB64 is a feat.



I wonder how much of an improvement we'll see when you port it to QB64 + OpenGL.
« Last Edit: May 24, 2021, 01:19:11 pm by FellippeHeitor »

Offline bplus

  • Global Moderator
  • Forum Resident
  • Posts: 8053
  • b = b + ...
    • View Profile
Re: Old Skool Plasma Effect
« Reply #5 on: May 24, 2021, 01:21:52 pm »
Quote
but the fact it's done entirely without resorting to OpenGL in QB64 is a feat.

Oh man! I did a whole Plasmatic study in QB64 without GL.
https://www.qb64.org/forum/index.php?topic=1451.0

Richard Frost and I were using it as backgrounds to clocks and he a chess board FX.

BTW it is slow because you do all that math for each pixel, sped up by smaller screens and stepping.

So doing it with assembly or maybe memory (Steve) might make a difference.
« Last Edit: May 24, 2021, 01:35:30 pm by bplus »

FellippeHeitor

  • Guest
Re: Old Skool Plasma Effect
« Reply #6 on: May 24, 2021, 02:04:47 pm »
'sall beautiful - plasma goes way over my head, so kudos to you all.

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
Re: Old Skool Plasma Effect
« Reply #7 on: May 24, 2021, 02:34:15 pm »
I've played around with this just a little to see how many FPS we can actually generate with it, with QB64.

As it stands, in its original form, I get about 17-18 FPS on my machine.  With a few minor tweaks, which I'll explain below, I can get it up to about 26-28 FPS, which I find to be quite a reasonable number.

Here's my little test program and changes: 
Code: QB64: [Select]
  1. 'conversion of https://www.shadertoy.com/view/NllGRM
  2. 'in QB64 By Ashish for Richard
  3. '24 May, 2021
  4.  
  5. _TITLE "PLAMSA EFFECT"
  6. SCREEN _NEWIMAGE(400, 400, 32)
  7. TYPE vec3
  8.     AS DOUBLE x, y, z
  9. DIM clr AS vec3, final AS vec3
  10. iTime = 0
  11. time## = TIMER + 1
  12.     FOR i = 0 TO 400
  13.         FOR j = 0 TO 400
  14.             u = j / 400
  15.             v = i / 400
  16.             t = ABS(SIN(15 * ((u - 0.5 + 0.3 * SIN(iTime * 2)) ^ 2 + (v - 0.5 + 0.3 * COS(iTime * 2)) ^ 2) ^ 0.5 - iTime * 5))
  17.             clr.x = t: clr.y = 1 - t: clr.z = 1 - t
  18.             final = clr
  19.             t = COS(u * 20 + iTime * 5) ^ 2
  20.             clr.x = 1 - t: clr.y = t: clr.z = 0
  21.             final.x = final.x + clr.x
  22.             final.y = final.y + clr.y
  23.             final.z = final.z + clr.z
  24.             PSET (j, i), _RGB32(final.x * 255, final.y * 255, final.z * 255)
  25.         NEXT j
  26.     NEXT i
  27.     fps = fps + 1
  28.     IF TIMER > time## THEN _TITLE STR$(fps): fps = 0: time## = TIMER + 1
  29.     _DISPLAY
  30.     iTime = iTime + 0.05
  31.  
  32.  
  33. DIM m AS _MEM: m = _MEMIMAGE(0)
  34.  
  35. iTime = 0
  36. time## = TIMER + 1
  37.     i = 0
  38.     DO UNTIL i >= 400
  39.         j = 0
  40.         v = i / 400
  41.         DO UNTIL j >= 400
  42.             u = j / 400
  43.             i2 = iTime * 2
  44.             i5 = iTime * 5
  45.             '            t = ABS(SIN(15 * ((u - 0.5 + 0.3 * SIN(i2)) ^ 2 + (v - 0.5 + 0.3 * COS(i2)) ^ 2) ^ 0.5 - i5))
  46.             t = ABS(SIN(15 * _HYPOT(u - 0.5 + 0.3 * SIN(i2), v - 0.5 + 0.3 * COS(i2)) - i5))
  47.  
  48.             final.x = t: final.z = 1 - t
  49.             t = COS(u * 20 + i5) ^ 2
  50.             clr.x = 1 - t: clr.y = t ': clr.z = 0
  51.             final.x = final.x + clr.x
  52.             final.y = final.z + clr.y
  53.             '            final.z = final.z + clr.z
  54.             $CHECKING:OFF
  55.             _MEMPUT m, (m.OFFSET + i * 400 * 4) + j * 4, _RGB32(final.x * 255, final.y * 255, final.z * 255) AS LONG
  56.             $CHECKING:ON
  57.             '            PSET (j, i), _RGB32(final.x * 255, final.y * 255, final.z * 255)
  58.             j = j + 1
  59.         LOOP
  60.         i = i + 1
  61.     LOOP
  62.     fps = fps + 1
  63.     IF TIMER > time## THEN _TITLE STR$(fps): fps = 0: time## = TIMER + 1
  64.     _DISPLAY
  65.     iTime = iTime + 0.05
  66.  

Contrary to what you might expect, using _MEM to put the data directly to the screen really doesn't help much in this instance.  _MEMPUT only gives an increase of about 1FPS -- our bottleneck is all the math involved here and the calculations we're doing.

If you look at the code, you can see where I ripped out some unneeded assignments which were basically duplicates, or unchanging values:
            clr.x = t: clr.y = 1 - t: clr.z = 1 - t
            final = clr  <--  This is one.  Why do the math in one set of variables, just to assign them immediately after to a different set?

Instead, I swapped this over to a simple:
            final.x = t: final.z = 1 - t

(Also if you look above, you'll see I ripped the duplicate math out for final.z -- It's the same value as final.y in this case.  No need for 2 variables to calculate and hold the values here.)

': clr.z = 0  <-- This I ripped out.  The value doesn't change from 0, so we don't need to reassign it over and over to a variable.

            '            final.z = final.z + clr.z   <-- This I ripped out as well.  clr.z is 0, so we really don't need to add a constant 0 to final.z after we calculate it the first time.

If you look, you'll also see where I've simplified a little math down to singular operations:
            i2 = iTime * 2
            i5 = iTime * 5

Usually, I find it faster to calculate a value once and store it in a variable, than I do to calculate the value multiple times.  If you look, you'll see that i2 is used twice, as is i5.  It just seems faster to me to do the math once and store, rather than calculate it multiple times here, but it honestly makes little difference since we're only dealing with 2 instances of both per cycle....

Another bit of math which I moved was where we calculate v at.  Since its value is wholly dependent on i, I've moved it to the inside of the i loop and outside of the j loop.  Calculate it once, rather than 400+ times...

All of these little math changes add up to a difference of about 2-3 FPS overall.

Which probably has you guys who can count saying, "Wait a moment!  We went from 16 to 28, and you've only pinpointed a change of about 4 FPS.  Where's the rest of the speed boost??"

The rest comes from this single change, which is by far the most important in altering performance:

From:

t = ABS(SIN(15 * ((u - 0.5 + 0.3 * SIN(i2)) ^ 2 + (v - 0.5 + 0.3 * COS(i2)) ^ 2) ^ 0.5 - i5))

To:

t = ABS(SIN(15 * _HYPOT(u - 0.5 + 0.3 * SIN(i2), v - 0.5 + 0.3 * COS(i2)) - i5))

In this case, the _HYPOT statement is quite a bit faster than manually trying to calculate your formula: ((u - 0.5 + 0.3 * SIN(i2)) ^ 2 + (v - 0.5 + 0.3 * COS(i2)) ^ 2) ^ 0.5

A single change to use _HYPOT ups my FPS from about 20 to 27, plus or minus a point.



As I keep looking over this, there's probably even more that we can do to optimize the math even better.

Since v, i2, and i5 don't change values except inside the i loop, we can probably safely move the i2 and i5 calculation outside the loop, as well as precalculate the y value of our main math formula there: v - 0.5 + 0.3 * COS(i2)) - i5)

The more math you can move outside a loop, the faster that loop is going to become.  It's a rather simple concept really, which people have gotten away from with the speed of modern computers -- doing something once is much faster than doing the same thing 400 times..  ;)

 
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
Re: Old Skool Plasma Effect
« Reply #8 on: May 24, 2021, 02:42:24 pm »
As I expected, moving i2, i5, and v outside the loop now gives us 31 FPS on my machine:

Code: QB64: [Select]
  1. 'conversion of https://www.shadertoy.com/view/NllGRM
  2. 'in QB64 By Ashish for Richard
  3. '24 May, 2021
  4.  
  5. _TITLE "PLAMSA EFFECT"
  6. SCREEN _NEWIMAGE(400, 400, 32)
  7. TYPE vec3
  8.     AS DOUBLE x, y, z
  9. DIM clr AS vec3, final AS vec3
  10. iTime = 0
  11. time## = TIMER + 1
  12.     FOR i = 0 TO 400
  13.         FOR j = 0 TO 400
  14.             u = j / 400
  15.             v = i / 400
  16.             t = ABS(SIN(15 * ((u - 0.5 + 0.3 * SIN(iTime * 2)) ^ 2 + (v - 0.5 + 0.3 * COS(iTime * 2)) ^ 2) ^ 0.5 - iTime * 5))
  17.             clr.x = t: clr.y = 1 - t: clr.z = 1 - t
  18.             final = clr
  19.             t = COS(u * 20 + iTime * 5) ^ 2
  20.             clr.x = 1 - t: clr.y = t: clr.z = 0
  21.             final.x = final.x + clr.x
  22.             final.y = final.y + clr.y
  23.             final.z = final.z + clr.z
  24.             PSET (j, i), _RGB32(final.x * 255, final.y * 255, final.z * 255)
  25.         NEXT j
  26.     NEXT i
  27.     fps = fps + 1
  28.     IF TIMER > time## THEN _TITLE STR$(fps): fps = 0: time## = TIMER + 1
  29.     _DISPLAY
  30.     iTime = iTime + 0.05
  31.  
  32.  
  33. DIM m AS _MEM: m = _MEMIMAGE(0)
  34.  
  35. iTime = 0
  36. time## = TIMER + 1
  37.     i = 0
  38.     DO UNTIL i >= 400
  39.         j = 0
  40.         i2 = iTime * 2
  41.         i5 = iTime * 5
  42.         v = i / 400 - 0.5 + 0.3 * COS(i2)
  43.         DO UNTIL j >= 400
  44.             u = j / 400
  45.             '            t = ABS(SIN(15 * ((u - 0.5 + 0.3 * SIN(i2)) ^ 2 + (v - 0.5 + 0.3 * COS(i2)) ^ 2) ^ 0.5 - i5))
  46.             t = ABS(SIN(15 * _HYPOT(u - 0.5 + 0.3 * SIN(i2), v) - i5))
  47.             final.x = t: final.z = 1 - t
  48.             t = COS(u * 20 + i5) ^ 2
  49.             clr.x = 1 - t: clr.y = t ': clr.z = 0
  50.             final.x = final.x + clr.x
  51.             final.y = final.z + clr.y
  52.             '            final.z = final.z + clr.z
  53.             $CHECKING:OFF
  54.             _MEMPUT m, (m.OFFSET + i * 400 * 4) + j * 4, _RGB32(final.x * 255, final.y * 255, final.z * 255) AS LONG
  55.             $CHECKING:ON
  56.             '            PSET (j, i), _RGB32(final.x * 255, final.y * 255, final.z * 255)
  57.             j = j + 1
  58.         LOOP
  59.         i = i + 1
  60.     LOOP
  61.     fps = fps + 1
  62.     IF TIMER > time## THEN _TITLE STR$(fps): fps = 0: time## = TIMER + 1
  63.     _DISPLAY
  64.     iTime = iTime + 0.05


So from 16, we've almost doubled performance to 31, just by moving some of the math around and using the _HYPOT command. 

If I've changed the performance of the program, and it's not producing the exact same results as before, I can't see the difference with my poor eyes.  Everything seems to be working like it did before -- just faster for us now.  I'll leave it up to Ashish to determine if I "broke" something by tweaking it as I have.



And a little more math tweaking gives us the following at 36FPS on my machine:

Code: QB64: [Select]
  1. 'conversion of https://www.shadertoy.com/view/NllGRM
  2. 'in QB64 By Ashish for Richard
  3. '24 May, 2021
  4.  
  5. _TITLE "PLAMSA EFFECT"
  6. SCREEN _NEWIMAGE(400, 400, 32)
  7. TYPE vec3
  8.     AS DOUBLE x, y, z
  9. DIM clr AS vec3, final AS vec3
  10. iTime = 0
  11. time## = TIMER + 1
  12.     FOR i = 0 TO 400
  13.         FOR j = 0 TO 400
  14.             u = j / 400
  15.             v = i / 400
  16.             t = ABS(SIN(15 * ((u - 0.5 + 0.3 * SIN(iTime * 2)) ^ 2 + (v - 0.5 + 0.3 * COS(iTime * 2)) ^ 2) ^ 0.5 - iTime * 5))
  17.             clr.x = t: clr.y = 1 - t: clr.z = 1 - t
  18.             final = clr
  19.             t = COS(u * 20 + iTime * 5) ^ 2
  20.             clr.x = 1 - t: clr.y = t: clr.z = 0
  21.             final.x = final.x + clr.x
  22.             final.y = final.y + clr.y
  23.             final.z = final.z + clr.z
  24.             PSET (j, i), _RGB32(final.x * 255, final.y * 255, final.z * 255)
  25.         NEXT j
  26.     NEXT i
  27.     fps = fps + 1
  28.     IF TIMER > time## THEN _TITLE STR$(fps): fps = 0: time## = TIMER + 1
  29.     _DISPLAY
  30.     iTime = iTime + 0.05
  31.  
  32.  
  33. DIM m AS _MEM: m = _MEMIMAGE(0)
  34.  
  35. iTime = 0
  36. time## = TIMER + 1
  37.     i = 0
  38.     DO UNTIL i >= 400
  39.         j = 0
  40.         i5 = iTime * 2.5
  41.         v = i / 400 - 0.5 + 0.3 * COS(iTime)
  42.         s = SIN(iTime) * 0.3 - 0.5
  43.         DO UNTIL j >= 400
  44.             u = j / 400
  45.             '            t = ABS(SIN(15 * ((u - 0.5 + 0.3 * SIN(i2)) ^ 2 + (v - 0.5 + 0.3 * COS(i2)) ^ 2) ^ 0.5 - i5))
  46.             t = ABS(SIN(15 * _HYPOT(u + s, v) - i5))
  47.             final.x = t: final.z = 1 - t
  48.             t = COS(u * 20 + i5) ^ 2
  49.             clr.x = 1 - t: clr.y = t ': clr.z = 0
  50.             final.x = final.x + clr.x
  51.             final.y = final.z + clr.y
  52.             '            final.z = final.z + clr.z
  53.             $CHECKING:OFF
  54.             _MEMPUT m, (m.OFFSET + i * 400 * 4) + j * 4, _RGB32(final.x * 255, final.y * 255, final.z * 255) AS LONG
  55.             $CHECKING:ON
  56.             'PSET (j, i), _RGB32(final.x * 255, final.y * 255, final.z * 255)
  57.             j = j + 1
  58.         LOOP
  59.         i = i + 1
  60.     LOOP
  61.     fps = fps + 1
  62.     IF TIMER > time## THEN _TITLE STR$(fps): fps = 0: time## = TIMER + 1
  63.     _DISPLAY
  64.     iTime = iTime + 0.1
  65.  

And I honestly don't see where there's anything else I can move out to tweak performance any more.  Our main math formula now looks like this:

            t = ABS(SIN(15 * _HYPOT(u + s, v) - i5))

Compared to the original which was:

            t = ABS(SIN(15 * ((u - 0.5 + 0.3 * SIN(iTime * 2)) ^ 2 + (v - 0.5 + 0.3 * COS(iTime * 2)) ^ 2) ^ 0.5 - iTime * 5))

« Last Edit: May 24, 2021, 03:07:16 pm by SMcNeill »
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline Dav

  • Forum Resident
  • Posts: 792
    • View Profile
Re: Old Skool Plasma Effect
« Reply #9 on: May 24, 2021, 03:13:21 pm »
That's some cool tweaking Steve.  I played around some too, used @bplus LINE trick he showed me when tweaking my plasma ball game, and it doubled the speed.

- Dav


Code: QB64: [Select]
  1. 'conversion of https://www.shadertoy.com/view/NllGRM
  2. 'in QB64 By Ashish for Richard
  3. '24 May, 2021
  4. _TITLE "PLAMSA EFFECT"
  5. SCREEN _NEWIMAGE(400, 400, 32)
  6. TYPE vec3
  7.     AS SINGLE x, y, z
  8. DIM clr AS vec3, final AS vec3
  9. iTime = 0
  10.     FOR i = 0 TO 400 STEP 2
  11.         FOR j = 0 TO 400 STEP 2
  12.             u = j / 400
  13.             v = i / 400
  14.             t = ABS(SIN(15 * ((u - 0.5 + 0.3 * SIN(iTime * 2)) ^ 2 + (v - 0.5 + 0.3 * COS(iTime * 2)) ^ 2) ^ 0.5 - iTime * 5))
  15.             clr.x = t: clr.y = 1 - t: clr.z = 1 - t
  16.             final = clr
  17.             t = COS(u * 20 + iTime * 5) ^ 2
  18.             clr.x = 1 - t: clr.y = t: clr.z = 0
  19.             final.x = final.x + clr.x
  20.             final.y = final.y + clr.y
  21.             final.z = final.z + clr.z
  22.             'PSET (j, i), _RGB32(final.x * 255, final.y * 255, final.z * 255)
  23.             LINE (j, i)-STEP(1, 1), _RGB32(final.x * 255, final.y * 255, final.z * 255), BF
  24.         NEXT j
  25.     NEXT i
  26.     _DISPLAY
  27.     iTime = iTime + 0.05
  28.  
  29.  
« Last Edit: May 24, 2021, 03:20:21 pm by Dav »

Offline bplus

  • Global Moderator
  • Forum Resident
  • Posts: 8053
  • b = b + ...
    • View Profile
Re: Old Skool Plasma Effect
« Reply #10 on: May 24, 2021, 03:18:00 pm »
Quote
That's some cool tweaking Steve.  I played around some too, used @bplus LINE trick he showed me when tweaking my plasma ball game, and it doubled the speed.

Ha! and guess who told me about that? Steve might know LOL!

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
Re: Old Skool Plasma Effect
« Reply #11 on: May 24, 2021, 03:24:50 pm »
I've plugged in the PSET, LINE, and _MEMPUT methods for testing, and I'm not seeing an increase with LINE BF, in this instance.  (Probably since we're originally just plotting a single point at a time.)

For PSET, I'm getting around 34FPS with the optimized code.  For LINE BF, it's around 31FPS.  For _MEM, it's around 36FPS.

Where LINE BF *really* makes a difference is when you're drawing a horizontal line across the screen. 

LINE (0,0)- STEP(100,0), color, BF is usually quite a bit faster than just LINE (0,0) - STEP(100,0), color.  For singular points though, it usually doesn't make too big of a difference, in my experience.   
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline Dav

  • Forum Resident
  • Posts: 792
    • View Profile
Re: Old Skool Plasma Effect
« Reply #12 on: May 24, 2021, 03:32:47 pm »
The LINE code I posted is cheating kind of, cause it drops a pixel. (there's STEP 2 on the FOR NEXT's).  So it's not quite as good as resolution as the original, but it's not that noticeable either and speed is increased.  On my old laptop the speed is doubled.

- Dav
« Last Edit: May 24, 2021, 03:33:54 pm by Dav »

Offline bplus

  • Global Moderator
  • Forum Resident
  • Posts: 8053
  • b = b + ...
    • View Profile
Re: Old Skool Plasma Effect
« Reply #13 on: May 24, 2021, 03:35:36 pm »
Yeah, I was going to say stepping by 2 cuts in half the time plus requires BF.

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
Re: Old Skool Plasma Effect
« Reply #14 on: May 24, 2021, 03:46:17 pm »
The LINE code I posted is cheating kind of, cause it drops a pixel. (there's STEP 2 on the FOR NEXT's).  So it's not quite as good as resolution as the original, but it's not that noticeable either and speed is increased.  On my old laptop the speed is doubled.

- Dav

AHA!  That’s the sneaky ninja part I missed out on — STEP 2.  I was wondering exactly what type of magic machine you had at home where it’d draw a 2x2 block twice as fast as a single 1x1 pixel!

Makes perfect sense though, when you think of it drawing a single 2x2 square verses 4   1x1 pixels.  I just overlooked your stepping, which seems so obvious now that you mentioned it.  ;D
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!