Author Topic: Is here way, how do this faster?  (Read 2200 times)

0 Members and 1 Guest are viewing this topic.

Offline Petr

  • Forum Resident
  • Posts: 1720
  • The best code is the DNA of the hops.
    • View Profile
Is here way, how do this faster?
« on: August 06, 2019, 01:31:17 pm »
Hi. This sub create a "hole" in software layer (area with alpha 0) so is in this area then hardware layer visible, if _DISPLAYORDER _HARDWARE, _SOFTWARE is used.  After inserting this SUB to program is mouse reaction time very slower. X1, Y1, X2, Y2 can be every time other. Is here some way, how take it faster?

MEM is used here, because it calculate none colors coopoerations between color with alpha and software background. This sub is for creating transparent area in software layer.


Code: QB64: [Select]
  1. SUB SCLS (x1 AS LONG, y1 AS LONG, x2 AS LONG, y2 AS LONG)
  2.       a = _DEST
  3.     DIM m AS _MEM, kolor2 AS _UNSIGNED LONG, x AS LONG, y AS LONG
  4.     m = _MEMIMAGE(a)
  5.     kolor2~& = &H00000000
  6.     IF y1 > y2 THEN SWAP y1, y2
  7.     IF x1 > x2 THEN SWAP x1, x2
  8.  
  9.     'memory out of range prevent
  10.     IF x1 < 0 THEN x1 = 0
  11.     IF y1 < 0 THEN y1 = 0
  12.     IF x2 > _WIDTH(a) - 1 THEN x2 = _WIDTH(a) - 1
  13.     IF y2 > _HEIGHT(a) - 1 THEN y2 = _HEIGHT(a) - 1
  14.     '---------------------------
  15.     LINE (W(3).X1, W(3).Y1)-(W(3).X2, W(3).Y2), _RGB32(183, 194, 200), BF 'create border between software and hardware area
  16.  
  17.     y = y1
  18.     x = x1
  19.     DO UNTIL y > y2
  20.         DO UNTIL x > x2
  21.             _MEMPUT m, m.OFFSET + 4 * ((_WIDTH(a) * y) + x), kolor2~&
  22.             x = x + 1
  23.         LOOP
  24.         x = x1: y = y + 1
  25.     LOOP
  26.     _MEMFREE m
  27.  
  28.  

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
Re: Is here way, how do this faster?
« Reply #1 on: August 06, 2019, 02:27:34 pm »
Move the math outside the loops as much as possible.  See the difference in performance between these versions:

Code: QB64: [Select]
  1. SCREEN _NEWIMAGE(640, 480, 32)
  2.  
  3. CLS , &HFFF0F000
  4. t## = TIMER
  5. FOR i = 1 TO 10000
  6.     Original_SCLS 100, 100, 200, 200
  7. t1## = TIMER
  8. FOR i = 1 TO 10000
  9.     Steve_SCLS 100, 100, 200, 200
  10. t2## = TIMER
  11.  
  12.  
  13.  
  14. PRINT USING "##.##### seconds for original"; t1## - t##
  15. PRINT USING "##.##### seconds for Steve"; t2## - t1##
  16.  
  17.  
  18. SUB Steve_SCLS (x1 AS LONG, y1 AS LONG, x2 AS LONG, y2 AS LONG)
  19.     a = _DEST
  20.     STATIC m AS _MEM
  21.     DIM x AS LONG, y AS LONG
  22.     DIM w AS LONG, o AS _OFFSET
  23.     m = _MEMIMAGE(a)
  24.     kolor2~& = &H00000000
  25.     IF y1 > y2 THEN SWAP y1, y2
  26.     IF x1 > x2 THEN SWAP x1, x2
  27.  
  28.     'memory out of range prevent
  29.     IF x1 < 0 THEN x1 = 0
  30.     IF y1 < 0 THEN y1 = 0
  31.     w = _WIDTH(a): h = _HEIGHT(a)
  32.     IF x2 > w - 1 THEN x2 = w - 1
  33.     IF y2 > h - 1 THEN y2 = h - 1
  34.     '---------------------------
  35.     '   LINE (W(3).X1, W(3).Y1)-(W(3).X2, W(3).Y2), _RGB32(183, 194, 200), BF 'create border between software and hardware area
  36.  
  37.     y = y1 * w * 4
  38.     y2 = y2 * w * 4
  39.     x1 = x1 * 4
  40.     x2 = x2 * 4
  41.     x = x1
  42.     w = w * 4
  43.     DO UNTIL y > y2
  44.         o = m.OFFSET + y
  45.         DO UNTIL x > x2
  46.             _MEMPUT m, o + x, 0 AS _UNSIGNED LONG
  47.             x = x + 4
  48.         LOOP
  49.         x = x1: y = y + w
  50.     LOOP
  51.     '_MEMFREE m 'No need to memfree as we keep the memblock available to point to other images in the future
  52.  
  53.  
  54. SUB Original_SCLS (x1 AS LONG, y1 AS LONG, x2 AS LONG, y2 AS LONG)
  55.     a = _DEST
  56.     DIM m AS _MEM, kolor2 AS _UNSIGNED LONG, x AS LONG, y AS LONG
  57.     m = _MEMIMAGE(a)
  58.     kolor2~& = &H00000000
  59.     IF y1 > y2 THEN SWAP y1, y2
  60.     IF x1 > x2 THEN SWAP x1, x2
  61.  
  62.     'memory out of range prevent
  63.     IF x1 < 0 THEN x1 = 0
  64.     IF y1 < 0 THEN y1 = 0
  65.     IF x2 > _WIDTH(a) - 1 THEN x2 = _WIDTH(a) - 1
  66.     IF y2 > _HEIGHT(a) - 1 THEN y2 = _HEIGHT(a) - 1
  67.     '---------------------------
  68.     'LINE (W(3).X1, W(3).Y1)-(W(3).X2, W(3).Y2), _RGB32(183, 194, 200), BF 'create border between software and hardware area
  69.  
  70.     y = y1
  71.     x = x1
  72.     DO UNTIL y > y2
  73.         DO UNTIL x > x2
  74.             _MEMPUT m, m.OFFSET + 4 * ((_WIDTH(a) * y) + x), kolor2~&
  75.             x = x + 1
  76.         LOOP
  77.         x = x1: y = y + 1
  78.     LOOP
  79.     _MEMFREE m


Notice the difference in the complexity of the math between both of our innermost loops:

_MEMPUT m, m.OFFSET + 4 * ((_WIDTH(a) * y) + x), kolor2~&

vs

_MEMPUT m, o + x, 0 AS _UNSIGNED LONG
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline Petr

  • Forum Resident
  • Posts: 1720
  • The best code is the DNA of the hops.
    • View Profile
Re: Is here way, how do this faster?
« Reply #2 on: August 06, 2019, 02:34:36 pm »
This is an absolutely incredible boost in performance! Thank you so much for your help, Steve! I have the gift of slowing down anything ...

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
Re: Is here way, how do this faster?
« Reply #3 on: August 06, 2019, 02:44:20 pm »
And, let's swap out out _MEMPUT command for a _MEMFILL and see how it performs for us:

Code: QB64: [Select]
  1. SCREEN _NEWIMAGE(640, 480, 32)
  2.  
  3. CLS , &HFFF0F000
  4. t## = TIMER
  5. FOR i = 1 TO 10000
  6.     Original_SCLS 100, 100, 200, 200
  7. t1## = TIMER
  8. CLS , &HFFF0F000
  9. FOR i = 1 TO 1000000
  10.     Steve_SCLS 100, 100, 200, 200
  11. t2## = TIMER
  12.  
  13.  
  14.  
  15. PRINT USING "##.######## seconds for original"; t1## - t##
  16. PRINT USING "##.######## seconds for Steve"; t2## - t1##
  17. PRINT "Notice the size of our FOR loops -- Steve_SCLS is running 100 times more,"
  18. PRINT "with very little time difference!"
  19.  
  20.  
  21.  
  22. SUB Steve_SCLS (x1 AS LONG, y1 AS LONG, x2 AS LONG, y2 AS LONG)
  23.     a = _DEST
  24.     STATIC m AS _MEM
  25.     DIM x AS LONG, y AS LONG
  26.     DIM w AS LONG, o AS _OFFSET
  27.     m = _MEMIMAGE(a)
  28.     kolor2~& = &H00000000
  29.     IF y1 > y2 THEN SWAP y1, y2
  30.     IF x1 > x2 THEN SWAP x1, x2
  31.  
  32.     'memory out of range prevent
  33.     IF x1 < 0 THEN x1 = 0
  34.     IF y1 < 0 THEN y1 = 0
  35.     w = _WIDTH(a): h = _HEIGHT(a)
  36.     IF x2 > w - 1 THEN x2 = w - 1
  37.     IF y2 > h - 1 THEN y2 = h - 1
  38.     '---------------------------
  39.     '   LINE (W(3).X1, W(3).Y1)-(W(3).X2, W(3).Y2), _RGB32(183, 194, 200), BF 'create border between software and hardware area
  40.  
  41.     y = y1 * w * 4 + x1 * 4
  42.     y2 = y2 * w * 4
  43.     xstop = x2 * 4 - x1 * 4
  44.     w = w * 4
  45.     DO UNTIL y > y2
  46.         o = m.OFFSET + y
  47.         _MEMFILL m, o, xstop, 0 AS _UNSIGNED _BYTE
  48.         y = y + w
  49.     LOOP
  50.     '_MEMFREE m 'No need to memfree as we keep the memblock available to point to other images in the future
  51.  
  52.  
  53. SUB Original_SCLS (x1 AS LONG, y1 AS LONG, x2 AS LONG, y2 AS LONG)
  54.     a = _DEST
  55.     DIM m AS _MEM, kolor2 AS _UNSIGNED LONG, x AS LONG, y AS LONG
  56.     m = _MEMIMAGE(a)
  57.     kolor2~& = &H00000000
  58.     IF y1 > y2 THEN SWAP y1, y2
  59.     IF x1 > x2 THEN SWAP x1, x2
  60.  
  61.     'memory out of range prevent
  62.     IF x1 < 0 THEN x1 = 0
  63.     IF y1 < 0 THEN y1 = 0
  64.     IF x2 > _WIDTH(a) - 1 THEN x2 = _WIDTH(a) - 1
  65.     IF y2 > _HEIGHT(a) - 1 THEN y2 = _HEIGHT(a) - 1
  66.     '---------------------------
  67.     'LINE (W(3).X1, W(3).Y1)-(W(3).X2, W(3).Y2), _RGB32(183, 194, 200), BF 'create border between software and hardware area
  68.  
  69.     y = y1
  70.     x = x1
  71.     DO UNTIL y > y2
  72.         DO UNTIL x > x2
  73.             _MEMPUT m, m.OFFSET + 4 * ((_WIDTH(a) * y) + x), kolor2~&
  74.             x = x + 1
  75.         LOOP
  76.         x = x1: y = y + 1
  77.     LOOP
  78.     _MEMFREE m

Careful application of math and use of _MEMFILL gives us a routine about 80 times faster than the original.  :D
« Last Edit: August 06, 2019, 02:45:52 pm by SMcNeill »
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline Petr

  • Forum Resident
  • Posts: 1720
  • The best code is the DNA of the hops.
    • View Profile
Re: Is here way, how do this faster?
« Reply #4 on: August 06, 2019, 02:58:13 pm »
Yeah, I see how you count the area where the memory is to be filled and then fill that area in rows, you load it in the blocks. I was thinking about how do it this way (my version of it is recording point by point). Well, this is a rocket. Much more usable than my version. Thanks!