Author Topic: DIM a (Max Index)  (Read 15148 times)

0 Members and 1 Guest are viewing this topic.

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
Re: DIM a (Max Index)
« Reply #30 on: October 17, 2018, 09:39:10 pm »
The numbers I have,
is in thes style;
strigs;
q1
q2
q3
....
q33
NOT q(1) etccc
The value in those strings are in the range 1-255

Can you share an example portion of the datafile/code?  Like Codeguy, I'm lost as to how you can have both byte values and 33-bit values stored sequentially.  How do you know how many bits/bytes to assign to a value?  What delimiters are used?  What structure does this data have?

More details are needed, if we're going to try and sort out a way to help speed things up for you.
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline codeguy

  • Forum Regular
  • Posts: 174
    • View Profile
Re: DIM a (Max Index)
« Reply #31 on: October 17, 2018, 10:42:49 pm »
if you ever get this figured out, here's a decent in-place algorithm to use:
performance (millisecond) profile for DOUBLE precision array until 134M+ elements :
 1 0
 3 0
 7 0
 15 0
 31 0
 63 0
 127 0
 255 0
 511 0
 1023 .9765625
 2047 .9765625
 4095 2.929688
 8191 5.859375
 16383 17.08984
 32767 51.26953
 65535 136.2305
 131071 241.6992
 262143 415.0391
 524287 946.7773
 1048575 1986.816
 2097151 4613.77
 4194303 9541.016
 8388607 18500.98
 16777215 42821.29
 33554431 84074.22
 67108863 186534.7
 134217727 379580.1

demo code:
Code: QB64: [Select]
  1. n& = 0
  2.     n& = n& + 1
  3.     REDIM a(0 TO n&) AS DOUBLE '* 65s
  4.     DIM u AS LONG
  5.     FOR u = LBOUND(a) TO UBOUND(a)
  6.         a(u) = RND
  7.     NEXT
  8.     s! = TIMER(.001)
  9.     QuickSortIntrospective a(), LBOUND(a), UBOUND(a), 1
  10.     f! = TIMER(.001)
  11.     'd$ = "." + STRING$(15, "#")
  12.     DIM l AS LONG
  13.     l = LBOUND(a)
  14.     FOR u = LBOUND(a) TO UBOUND(a)
  15.         IF a(u) < a(l) THEN STOP
  16.         'PRINT USING d$; a(u);
  17.         l = u
  18.     NEXT
  19.     _CLIPBOARD$ = _CLIPBOARD$ + STR$(n&) + STR$(1000 * (f! - s!)) + CHR$(13) + CHR$(10)
  20.     n& = n& * 2
  21.  

Complete IntroSort, in-place (uses no auxiliary) and combines QuickSort, HeapSort and InsertionSort. This will allow the maximum range of in-memory sorting in reasonable time.
Code: QB64: [Select]
  1. '****************************************
  2. '* The IntroSort() algorithm extended to QBxx because there is no qbxx-compatible code
  3. '* The IntroSort algorithm extended to qb64 because i could find no pure qbxx code
  4. '* 03Jun2017, by CodeGuy -- further mods for use in sorting library 03 Aug 2017
  5. '* Introspective Sort (IntroSort) falls back to MergeSort after so many levels of
  6. '* recursion and is good for highly redundant data (aka few unique)
  7. '* for very short runs, it falls back to InsertionSort.
  8.  
  9. SUB QuickSortIntrospective (CGSortLibArr() AS DOUBLE, IntroSort_start AS LONG, IntroSort_finish AS LONG, order AS INTEGER)
  10.     DIM IntroSort_i AS LONG
  11.     DIM IntroSort_J AS LONG
  12.     STATIC IntroSort_level AS INTEGER
  13.     STATIC IntroSort_MaxRecurseLevel AS INTEGER
  14.     IntroSort_MaxRecurseLevel = 15
  15.     IF IntroSort_start < IntroSort_finish THEN
  16.         IF IntroSort_finish - IntroSort_start > 31 THEN
  17.             IF IntroSort_level > IntroSort_MaxRecurseLevel THEN
  18.                 HeapSort CGSortLibArr(), IntroSort_start, IntroSort_finish, order
  19.             ELSE
  20.                 IntroSort_level = IntroSort_level + 1
  21.                 QuickSortIJ CGSortLibArr(), IntroSort_start, IntroSort_finish, IntroSort_i, IntroSort_J, order
  22.                 QuickSortIntrospective CGSortLibArr(), IntroSort_start, IntroSort_J, order
  23.                 QuickSortIntrospective CGSortLibArr(), IntroSort_i, IntroSort_finish, order
  24.                 IntroSort_level = IntroSort_level - 1
  25.             END IF
  26.         ELSE
  27.             InsertionSort CGSortLibArr(), IntroSort_start, IntroSort_finish, order
  28.         END IF
  29.     END IF
  30.  
  31. SUB QuickSortIJ (CGSortLibArr() AS DOUBLE, start AS LONG, finish AS LONG, i AS LONG, j AS LONG, order AS INTEGER)
  32.     DIM Compare AS DOUBLE '* MUST be the same type as CGSortLibArr()
  33.     i = start
  34.     j = finish
  35.     Compare = CGSortLibArr(i + (j - i) \ 2)
  36.     SELECT CASE order
  37.         CASE 1
  38.             DO
  39.                 DO WHILE CGSortLibArr(i) < Compare
  40.                     i = i + 1
  41.                 LOOP
  42.                 DO WHILE CGSortLibArr(j) > Compare
  43.                     j = j - 1
  44.                 LOOP
  45.                 IF i <= j THEN
  46.                     IF i <> j THEN
  47.                         SWAP CGSortLibArr(i), CGSortLibArr(j)
  48.                     END IF
  49.                     i = i + 1
  50.                     j = j - 1
  51.                 END IF
  52.             LOOP UNTIL i > j
  53.         CASE ELSE
  54.             DO
  55.                 DO WHILE CGSortLibArr(i) > Compare
  56.                     i = i + 1
  57.                 LOOP
  58.                 DO WHILE CGSortLibArr(j) < Compare
  59.                     j = j - 1
  60.                 LOOP
  61.                 IF i <= j THEN
  62.                     IF i <> j THEN
  63.                         SWAP CGSortLibArr(i), CGSortLibArr(j)
  64.                     END IF
  65.                     i = i + 1
  66.                     j = j - 1
  67.                 END IF
  68.             LOOP UNTIL i > j
  69.     END SELECT
  70.  
  71. SUB InsertionSort (CGSortLibArr() AS DOUBLE, start AS LONG, finish AS LONG, order AS INTEGER)
  72.     DIM InSort_Local_ArrayTemp AS DOUBLE
  73.     DIM InSort_Local_i AS LONG
  74.     DIM InSort_Local_j AS LONG
  75.     SELECT CASE order
  76.         CASE 1
  77.             FOR InSort_Local_i = start + 1 TO finish
  78.                 InSort_Local_j = InSort_Local_i - 1
  79.                 IF CGSortLibArr(InSort_Local_i) < CGSortLibArr(InSort_Local_j) THEN
  80.                     InSort_Local_ArrayTemp = CGSortLibArr(InSort_Local_i)
  81.                     DO UNTIL InSort_Local_j < start
  82.                         IF (InSort_Local_ArrayTemp < CGSortLibArr(InSort_Local_j)) THEN
  83.                             CGSortLibArr(InSort_Local_j + 1) = CGSortLibArr(InSort_Local_j)
  84.                             InSort_Local_j = InSort_Local_j - 1
  85.                         ELSE
  86.                             EXIT DO
  87.                         END IF
  88.                     LOOP
  89.                     CGSortLibArr(InSort_Local_j + 1) = InSort_Local_ArrayTemp
  90.                 END IF
  91.             NEXT
  92.         CASE ELSE
  93.             FOR InSort_Local_i = start + 1 TO finish
  94.                 InSort_Local_j = InSort_Local_i - 1
  95.                 IF CGSortLibArr(InSort_Local_i) > CGSortLibArr(InSort_Local_j) THEN
  96.                     InSort_Local_ArrayTemp = CGSortLibArr(InSort_Local_i)
  97.                     DO UNTIL InSort_Local_j < start
  98.                         IF (InSort_Local_ArrayTemp > CGSortLibArr(InSort_Local_j)) THEN
  99.                             CGSortLibArr(InSort_Local_j + 1) = CGSortLibArr(InSort_Local_j)
  100.                             InSort_Local_j = InSort_Local_j - 1
  101.                         ELSE
  102.                             EXIT DO
  103.                         END IF
  104.                     LOOP
  105.                     CGSortLibArr(InSort_Local_j + 1) = InSort_Local_ArrayTemp
  106.                 END IF
  107.             NEXT
  108.     END SELECT
  109.  
  110. SUB HeapSort (CGSortLibArr() AS DOUBLE, start AS LONG, finish AS LONG, order AS INTEGER)
  111.     DIM i AS LONG
  112.     DIM j AS LONG
  113.     FOR i = start + 1 TO finish
  114.         PercolateUp CGSortLibArr(), start, i, order
  115.     NEXT i
  116.  
  117.     FOR i = finish TO start + 1 STEP -1
  118.         SWAP CGSortLibArr(start), CGSortLibArr(i)
  119.         PercolateDown CGSortLibArr(), start, i - 1, order
  120.     NEXT i
  121.  
  122. SUB PercolateDown (CGSortLibArr() AS DOUBLE, start AS LONG, MaxLevel AS LONG, order AS INTEGER)
  123.     DIM i AS LONG
  124.     DIM child AS LONG
  125.     DIM ax AS LONG
  126.     i = start
  127.     '* Move the value in GetPixel!!!(start) down the heap until it has
  128.     '* reached its proper node (that is, until it is less than its parent
  129.     '* node or until it has reached MaxLevel!!!, the bottom of the current heap):
  130.     DO
  131.         child = 2 * (i - start) + start ' Get the subscript for the Child node.
  132.         '* Reached the bottom of the heap, so exit this procedure:
  133.         IF child > MaxLevel THEN EXIT DO
  134.         SELECT CASE order
  135.             CASE 1
  136.                 '* If there are two Child nodes, find out which one is bigger:
  137.                 ax = child + 1
  138.                 IF ax <= MaxLevel THEN
  139.                     IF CGSortLibArr(ax) > CGSortLibArr(child) THEN
  140.                         child = ax
  141.                     END IF
  142.                 END IF
  143.  
  144.                 '* Move the value down if it is still not bigger than either one of
  145.                 '* its Child!!!ren:
  146.                 IF CGSortLibArr(i) < CGSortLibArr(child) THEN
  147.                     SWAP CGSortLibArr(i), CGSortLibArr(child)
  148.                     i = child
  149.                 ELSE
  150.                     '* Otherwise, CGSortLibArr() has been restored to a heap from start to MaxLevel!!!,
  151.                     '* so exit:
  152.                     EXIT DO
  153.                 END IF
  154.             CASE ELSE
  155.                 '* If there are two Child nodes, find out which one is smaller:
  156.                 ax = child + 1
  157.                 IF ax <= MaxLevel THEN
  158.                     IF CGSortLibArr(ax) < CGSortLibArr(child) THEN
  159.                         child = ax
  160.                     END IF
  161.                 END IF
  162.                 '* Move the value down if it is still not smaller than either one of
  163.                 '* its Child!!!ren:
  164.                 IF CGSortLibArr(i) > CGSortLibArr(child) THEN
  165.                     SWAP CGSortLibArr(i), CGSortLibArr(child)
  166.                     i = child
  167.                 ELSE
  168.                     '* Otherwise, CGSortLibArr() has been restored to a heap from start to MaxLevel!!!,
  169.                     '* so exit:
  170.                     EXIT DO
  171.                 END IF
  172.         END SELECT
  173.     LOOP
  174.  
  175. SUB PercolateUp (CGSortLibArr() AS DOUBLE, start AS LONG, MaxLevel AS LONG, order AS INTEGER)
  176.     DIM parent AS LONG
  177.     DIM i AS LONG
  178.     SELECT CASE order
  179.         CASE 1
  180.             i = MaxLevel
  181.             '* Move the value in CGSortLibArr(MaxLevel!!!) up the heap until it has
  182.             '* reached its proper node (that is, until it is greater than either
  183.             '* of its Child!!! nodes, or until it has reached 1, the top of the heap):
  184.             DO UNTIL i = start
  185.                 '* Get the subscript for the parent node.
  186.                 parent = start + (i - start) \ 2
  187.                 '* The value at the current node is still bigger than the value at
  188.                 '* its parent node, so swap these two array elements:
  189.                 IF CGSortLibArr(i) > CGSortLibArr(parent) THEN
  190.                     SWAP CGSortLibArr(parent), CGSortLibArr(i)
  191.                     i = parent
  192.                 ELSE
  193.                     '* Otherwise, the element has reached its proper place in the heap,
  194.                     '* so exit this procedure:
  195.                     EXIT DO
  196.                 END IF
  197.             LOOP
  198.         CASE ELSE
  199.             i = MaxLevel
  200.             '* Move the value in CGSortLibArr(MaxLevel!!!) up the heap until it has
  201.             '* reached its proper node (that is, until it is greater than either
  202.             '* of its Child!!! nodes, or until it has reached 1, the top of the heap):
  203.             DO UNTIL i = start
  204.                 '* Get the subscript for the parent node.
  205.                 parent = start + (i - start) \ 2
  206.                 '* The value at the current node is still smaller than the value at
  207.                 '* its parent node, so swap these two array elements:
  208.                 IF CGSortLibArr(i) < CGSortLibArr(parent) THEN
  209.                     SWAP CGSortLibArr(parent), CGSortLibArr(i)
  210.                     i = parent
  211.                 ELSE
  212.                     '* Otherwise, the element has reached its proper place in the heap,
  213.                     '* so exit this procedure:
  214.                     EXIT DO
  215.                 END IF
  216.             LOOP
  217.     END SELECT
  218.  
IntroSort, completely in-place. Don't forget to change AS LONG to as _INTEGER64 to avoid overflow problems on indexes and change types as appropriate.


code to use _UNSIGNED LONG for indexes into arrays
Code: QB64: [Select]
  1. testn = 0
  2. '_CLIPBOARD$ = ""
  3.     testn = testn + 1
  4.     REDIM a(0 TO testn) AS DOUBLE '* 65s
  5.     FOR u = LBOUND(a) TO UBOUND(a)
  6.         a(u) = RND
  7.     NEXT
  8.     s! = TIMER(.001)
  9.     QuickSortIntrospective a(), LBOUND(a), UBOUND(a), 1
  10.     f! = TIMER(.001)
  11.     'd$ = "." + STRING$(15, "#")
  12.     DIM lp AS _UNSIGNED LONG
  13.     lp = LBOUND(a)
  14.     FOR u = LBOUND(a) TO UBOUND(a)
  15.         IF a(u) < a(lp) THEN STOP
  16.         'PRINT USING d$; a(u);
  17.         lp = u
  18.     NEXT
  19.     ''_CLIPBOARD$ = _CLIPBOARD$ + STR$(testn) + STR$(1000 * (f! - s!)) + CHR$(13) + CHR$(10)
  20.     PRINT STR$(testn); STR$(1000 * (f! - s!))
  21.     testn = testn * 2
  22.  
  23. '****************************************
  24. '* The IntroSort() algorithm extended to QBxx because there is no qbxx-compatible code
  25. '* The IntroSort algorithm extended to qb64 because i could find no pure qbxx code
  26. '* 03Jun2017, by CodeGuy -- further mods for use in sorting library 03 Aug 2017
  27. '* Introspective Sort (IntroSort) falls back to MergeSort after so many levels of
  28. '* recursion and is good for highly redundant data (aka few unique)
  29. '* for very short runs, it falls back to InsertionSort.
  30.  
  31. SUB QuickSortIntrospective (CGSortLibArr() AS DOUBLE, IntroSort_start AS _UNSIGNED LONG, IntroSort_finish AS _UNSIGNED LONG, order AS INTEGER)
  32.     DIM IntroSort_i AS _UNSIGNED LONG
  33.     DIM IntroSort_J AS _UNSIGNED LONG
  34.     STATIC IntroSort_level AS INTEGER
  35.     STATIC IntroSort_MaxRecurseLevel AS INTEGER
  36.     IntroSort_MaxRecurseLevel = 15
  37.     IF IntroSort_start < IntroSort_finish THEN
  38.         IF IntroSort_finish - IntroSort_start > 31 THEN
  39.             IF IntroSort_level > IntroSort_MaxRecurseLevel THEN
  40.                 HeapSort CGSortLibArr(), IntroSort_start, IntroSort_finish, order
  41.             ELSE
  42.                 IntroSort_level = IntroSort_level + 1
  43.                 QuickSortIJ CGSortLibArr(), IntroSort_start, IntroSort_finish, IntroSort_i, IntroSort_J, order
  44.                 QuickSortIntrospective CGSortLibArr(), IntroSort_start, IntroSort_J, order
  45.                 QuickSortIntrospective CGSortLibArr(), IntroSort_i, IntroSort_finish, order
  46.                 IntroSort_level = IntroSort_level - 1
  47.             END IF
  48.         ELSE
  49.             InsertionSort CGSortLibArr(), IntroSort_start, IntroSort_finish, order
  50.         END IF
  51.     END IF
  52.  
  53. SUB QuickSortIJ (CGSortLibArr() AS DOUBLE, start AS _UNSIGNED LONG, finish AS _UNSIGNED LONG, i AS _UNSIGNED LONG, j AS _UNSIGNED LONG, order AS INTEGER)
  54.     DIM Compare AS DOUBLE '* MUST be the same type as CGSortLibArr()
  55.     i = start
  56.     j = finish
  57.     Compare = CGSortLibArr(i + (j - i) \ 2)
  58.     SELECT CASE order
  59.         CASE 1
  60.             DO
  61.                 DO WHILE CGSortLibArr(i) < Compare
  62.                     i = i + 1
  63.                 LOOP
  64.                 DO WHILE CGSortLibArr(j) > Compare
  65.                     j = j - 1
  66.                 LOOP
  67.                 IF i <= j THEN
  68.                     IF i <> j THEN
  69.                         SWAP CGSortLibArr(i), CGSortLibArr(j)
  70.                     END IF
  71.                     i = i + 1
  72.                     j = j - 1
  73.                 END IF
  74.             LOOP UNTIL i > j
  75.         CASE ELSE
  76.             DO
  77.                 DO WHILE CGSortLibArr(i) > Compare
  78.                     i = i + 1
  79.                 LOOP
  80.                 DO WHILE CGSortLibArr(j) < Compare
  81.                     j = j - 1
  82.                 LOOP
  83.                 IF i <= j THEN
  84.                     IF i <> j THEN
  85.                         SWAP CGSortLibArr(i), CGSortLibArr(j)
  86.                     END IF
  87.                     i = i + 1
  88.                     j = j - 1
  89.                 END IF
  90.             LOOP UNTIL i > j
  91.     END SELECT
  92.  
  93. SUB InsertionSort (CGSortLibArr() AS DOUBLE, start AS _UNSIGNED LONG, finish AS _UNSIGNED LONG, order AS INTEGER)
  94.     DIM InSort_Local_ArrayTemp AS DOUBLE
  95.     DIM InSort_Local_i AS _UNSIGNED LONG
  96.     DIM InSort_Local_j AS _UNSIGNED LONG
  97.     SELECT CASE order
  98.         CASE 1
  99.             FOR InSort_Local_i = start + 1 TO finish
  100.                 InSort_Local_j = InSort_Local_i - 1
  101.                 IF CGSortLibArr(InSort_Local_i) < CGSortLibArr(InSort_Local_j) THEN
  102.                     InSort_Local_ArrayTemp = CGSortLibArr(InSort_Local_i)
  103.                     DO WHILE InSort_Local_j + 1 > start
  104.                         IF (InSort_Local_ArrayTemp < CGSortLibArr(InSort_Local_j)) THEN
  105.                             CGSortLibArr(InSort_Local_j + 1) = CGSortLibArr(InSort_Local_j)
  106.                             InSort_Local_j = InSort_Local_j - 1
  107.                         ELSE
  108.                             EXIT DO
  109.                         END IF
  110.                     LOOP
  111.                     CGSortLibArr(InSort_Local_j + 1) = InSort_Local_ArrayTemp
  112.                 END IF
  113.             NEXT
  114.         CASE ELSE
  115.             FOR InSort_Local_i = start + 1 TO finish
  116.                 InSort_Local_j = InSort_Local_i - 1
  117.                 IF CGSortLibArr(InSort_Local_i) > CGSortLibArr(InSort_Local_j) THEN
  118.                     InSort_Local_ArrayTemp = CGSortLibArr(InSort_Local_i)
  119.                     DO WHILE InSort_Local_j + 1 > start
  120.                         IF (InSort_Local_ArrayTemp > CGSortLibArr(InSort_Local_j)) THEN
  121.                             CGSortLibArr(InSort_Local_j + 1) = CGSortLibArr(InSort_Local_j)
  122.                             InSort_Local_j = InSort_Local_j - 1
  123.                         ELSE
  124.                             EXIT DO
  125.                         END IF
  126.                     LOOP
  127.                     CGSortLibArr(InSort_Local_j + 1) = InSort_Local_ArrayTemp
  128.                 END IF
  129.             NEXT
  130.     END SELECT
  131.  
  132. SUB HeapSort (CGSortLibArr() AS DOUBLE, start AS _UNSIGNED LONG, finish AS _UNSIGNED LONG, order AS INTEGER)
  133.     FOR i = start + 1 TO finish
  134.         PercolateUp CGSortLibArr(), start, i, order
  135.     NEXT i
  136.  
  137.     FOR i = finish TO start + 1 STEP -1
  138.         SWAP CGSortLibArr(start), CGSortLibArr(i)
  139.         PercolateDown CGSortLibArr(), start, i - 1, order
  140.     NEXT i
  141.  
  142. SUB PercolateDown (CGSortLibArr() AS DOUBLE, start AS _UNSIGNED LONG, MaxLevel AS _UNSIGNED LONG, order AS INTEGER)
  143.     DIM child AS _UNSIGNED LONG
  144.     DIM ax AS _UNSIGNED LONG
  145.     i = start
  146.     '* Move the value in GetPixel!!!(start) down the heap until it has
  147.     '* reached its proper node (that is, until it is less than its parent
  148.     '* node or until it has reached MaxLevel!!!, the bottom of the current heap):
  149.     DO
  150.         child = 2 * (i - start) + start ' Get the subscript for the Child node.
  151.         '* Reached the bottom of the heap, so exit this procedure:
  152.         IF child > MaxLevel THEN EXIT DO
  153.         SELECT CASE order
  154.             CASE 1
  155.                 '* If there are two Child nodes, find out which one is bigger:
  156.                 ax = child + 1
  157.                 IF ax <= MaxLevel THEN
  158.                     IF CGSortLibArr(ax) > CGSortLibArr(child) THEN
  159.                         child = ax
  160.                     END IF
  161.                 END IF
  162.  
  163.                 '* Move the value down if it is still not bigger than either one of
  164.                 '* its Child!!!ren:
  165.                 IF CGSortLibArr(i) < CGSortLibArr(child) THEN
  166.                     SWAP CGSortLibArr(i), CGSortLibArr(child)
  167.                     i = child
  168.                 ELSE
  169.                     '* Otherwise, CGSortLibArr() has been restored to a heap from start to MaxLevel!!!,
  170.                     '* so exit:
  171.                     EXIT DO
  172.                 END IF
  173.             CASE ELSE
  174.                 '* If there are two Child nodes, find out which one is smaller:
  175.                 ax = child + 1
  176.                 IF ax <= MaxLevel THEN
  177.                     IF CGSortLibArr(ax) < CGSortLibArr(child) THEN
  178.                         child = ax
  179.                     END IF
  180.                 END IF
  181.                 '* Move the value down if it is still not smaller than either one of
  182.                 '* its Child!!!ren:
  183.                 IF CGSortLibArr(i) > CGSortLibArr(child) THEN
  184.                     SWAP CGSortLibArr(i), CGSortLibArr(child)
  185.                     i = child
  186.                 ELSE
  187.                     '* Otherwise, CGSortLibArr() has been restored to a heap from start to MaxLevel!!!,
  188.                     '* so exit:
  189.                     EXIT DO
  190.                 END IF
  191.         END SELECT
  192.     LOOP
  193.  
  194. SUB PercolateUp (CGSortLibArr() AS DOUBLE, start AS _UNSIGNED LONG, MaxLevel AS _UNSIGNED LONG, order AS INTEGER)
  195.     DIM parent AS _UNSIGNED LONG
  196.     SELECT CASE order
  197.         CASE 1
  198.             i = MaxLevel
  199.             '* Move the value in CGSortLibArr(MaxLevel!!!) up the heap until it has
  200.             '* reached its proper node (that is, until it is greater than either
  201.             '* of its Child!!! nodes, or until it has reached 1, the top of the heap):
  202.             DO UNTIL i = start
  203.                 '* Get the subscript for the parent node.
  204.                 parent = start + (i - start) \ 2
  205.                 '* The value at the current node is still bigger than the value at
  206.                 '* its parent node, so swap these two array elements:
  207.                 IF CGSortLibArr(i) > CGSortLibArr(parent) THEN
  208.                     SWAP CGSortLibArr(parent), CGSortLibArr(i)
  209.                     i = parent
  210.                 ELSE
  211.                     '* Otherwise, the element has reached its proper place in the heap,
  212.                     '* so exit this procedure:
  213.                     EXIT DO
  214.                 END IF
  215.             LOOP
  216.         CASE ELSE
  217.             i = MaxLevel
  218.             '* Move the value in CGSortLibArr(MaxLevel!!!) up the heap until it has
  219.             '* reached its proper node (that is, until it is greater than either
  220.             '* of its Child!!! nodes, or until it has reached 1, the top of the heap):
  221.             DO UNTIL i = start
  222.                 '* Get the subscript for the parent node.
  223.                 parent = start + (i - start) \ 2
  224.                 '* The value at the current node is still smaller than the value at
  225.                 '* its parent node, so swap these two array elements:
  226.                 IF CGSortLibArr(i) < CGSortLibArr(parent) THEN
  227.                     SWAP CGSortLibArr(parent), CGSortLibArr(i)
  228.                     i = parent
  229.                 ELSE
  230.                     '* Otherwise, the element has reached its proper place in the heap,
  231.                     '* so exit this procedure:
  232.                     EXIT DO
  233.                 END IF
  234.             LOOP
  235.     END SELECT
  236.  

InsertionSort was not very friendly to _UNSIGNED until modified this way:

Code: QB64: [Select]
  1. testn = 0
  2.     testn = testn + 1
  3.     REDIM a(0 TO testn) AS DOUBLE '* 65s
  4.     FOR u = LBOUND(a) TO UBOUND(a)
  5.         a(u) = RND
  6.     NEXT
  7.     s! = TIMER(.001)
  8.     QuickSortIntrospective a(), LBOUND(a), UBOUND(a), 1
  9.     f! = TIMER(.001)
  10.     'd$ = "." + STRING$(15, "#")
  11.     DIM lp AS _UNSIGNED LONG
  12.     lp = LBOUND(a)
  13.     FOR u = LBOUND(a) TO UBOUND(a)
  14.         IF a(u) < a(lp) THEN STOP
  15.         'PRINT USING d$; a(u);
  16.         lp = u
  17.     NEXT
  18.     _CLIPBOARD$ = _CLIPBOARD$ + STR$(testn) + STR$(1000 * (f! - s!)) + CHR$(13) + CHR$(10)
  19.     testn = testn * 2
  20.  
  21.  
  22. '****************************************
  23. '* The IntroSort() algorithm extended to QBxx because there is no qbxx-compatible code
  24. '* The IntroSort algorithm extended to qb64 because i could find no pure qbxx code
  25. '* 03Jun2017, by CodeGuy -- further mods for use in sorting library 03 Aug 2017
  26. '* Introspective Sort (IntroSort) falls back to MergeSort after so many levels of
  27. '* recursion and is good for highly redundant data (aka few unique)
  28. '* for very short runs, it falls back to InsertionSort.
  29.  
  30. SUB QuickSortIntrospective (CGSortLibArr() AS DOUBLE, IntroSort_start AS _UNSIGNED LONG, IntroSort_finish AS _UNSIGNED LONG, order AS INTEGER)
  31.     DIM IntroSort_i AS _UNSIGNED LONG
  32.     DIM IntroSort_J AS _UNSIGNED LONG
  33.     STATIC IntroSort_level AS INTEGER
  34.     STATIC IntroSort_MaxRecurseLevel AS INTEGER
  35.     IntroSort_MaxRecurseLevel = 15
  36.     IF IntroSort_start < IntroSort_finish THEN
  37.         IF IntroSort_finish - IntroSort_start > 31 THEN
  38.             IF IntroSort_level > IntroSort_MaxRecurseLevel THEN
  39.                 HeapSort CGSortLibArr(), IntroSort_start, IntroSort_finish, order
  40.             ELSE
  41.                 IntroSort_level = IntroSort_level + 1
  42.                 QuickSortIJ CGSortLibArr(), IntroSort_start, IntroSort_finish, IntroSort_i, IntroSort_J, order
  43.                 QuickSortIntrospective CGSortLibArr(), IntroSort_start, IntroSort_J, order
  44.                 QuickSortIntrospective CGSortLibArr(), IntroSort_i, IntroSort_finish, order
  45.                 IntroSort_level = IntroSort_level - 1
  46.             END IF
  47.         ELSE
  48.             InsertionSort CGSortLibArr(), IntroSort_start, IntroSort_finish, order
  49.         END IF
  50.     END IF
  51.  
  52. SUB QuickSortIJ (CGSortLibArr() AS DOUBLE, start AS _UNSIGNED LONG, finish AS _UNSIGNED LONG, i AS _UNSIGNED LONG, j AS _UNSIGNED LONG, order AS INTEGER)
  53.     DIM Compare AS DOUBLE '* MUST be the same type as CGSortLibArr()
  54.     i = start
  55.     j = finish
  56.     Compare = CGSortLibArr(i + (j - i) \ 2)
  57.     SELECT CASE order
  58.         CASE 1
  59.             DO
  60.                 DO WHILE CGSortLibArr(i) < Compare
  61.                     i = i + 1
  62.                 LOOP
  63.                 DO WHILE CGSortLibArr(j) > Compare
  64.                     j = j - 1
  65.                 LOOP
  66.                 IF i <= j THEN
  67.                     IF i <> j THEN
  68.                         SWAP CGSortLibArr(i), CGSortLibArr(j)
  69.                     END IF
  70.                     i = i + 1
  71.                     j = j - 1
  72.                 END IF
  73.             LOOP UNTIL i > j
  74.         CASE ELSE
  75.             DO
  76.                 DO WHILE CGSortLibArr(i) > Compare
  77.                     i = i + 1
  78.                 LOOP
  79.                 DO WHILE CGSortLibArr(j) < Compare
  80.                     j = j - 1
  81.                 LOOP
  82.                 IF i <= j THEN
  83.                     IF i <> j THEN
  84.                         SWAP CGSortLibArr(i), CGSortLibArr(j)
  85.                     END IF
  86.                     i = i + 1
  87.                     j = j - 1
  88.                 END IF
  89.             LOOP UNTIL i > j
  90.     END SELECT
  91.  
  92. SUB InsertionSort (CGSortLibArr() AS DOUBLE, start AS _UNSIGNED LONG, finish AS _UNSIGNED LONG, order AS INTEGER)
  93.     DIM InSort_Local_ArrayTemp AS DOUBLE
  94.     DIM InSort_Local_i AS _UNSIGNED LONG
  95.     DIM InSort_Local_j AS _UNSIGNED LONG
  96.     SELECT CASE order
  97.         CASE 1
  98.             FOR InSort_Local_i = start + 1 TO finish
  99.                 InSort_Local_j = InSort_Local_i - 1
  100.                 IF CGSortLibArr(InSort_Local_i) < CGSortLibArr(InSort_Local_j) THEN
  101.                     InSort_Local_ArrayTemp = CGSortLibArr(InSort_Local_i)
  102.                     DO WHILE InSort_Local_j + 1 > start
  103.                         IF (InSort_Local_ArrayTemp < CGSortLibArr(InSort_Local_j)) THEN
  104.                             CGSortLibArr(InSort_Local_j + 1) = CGSortLibArr(InSort_Local_j)
  105.                             InSort_Local_j = InSort_Local_j - 1
  106.                         ELSE
  107.                             EXIT DO
  108.                         END IF
  109.                     LOOP
  110.                     CGSortLibArr(InSort_Local_j + 1) = InSort_Local_ArrayTemp
  111.                 END IF
  112.             NEXT
  113.         CASE ELSE
  114.             FOR InSort_Local_i = start + 1 TO finish
  115.                 InSort_Local_j = InSort_Local_i - 1
  116.                 IF CGSortLibArr(InSort_Local_i) > CGSortLibArr(InSort_Local_j) THEN
  117.                     InSort_Local_ArrayTemp = CGSortLibArr(InSort_Local_i)
  118.                     DO WHILE InSort_Local_j + 1 > start
  119.                         IF (InSort_Local_ArrayTemp > CGSortLibArr(InSort_Local_j)) THEN
  120.                             CGSortLibArr(InSort_Local_j + 1) = CGSortLibArr(InSort_Local_j)
  121.                             InSort_Local_j = InSort_Local_j - 1
  122.                         ELSE
  123.                             EXIT DO
  124.                         END IF
  125.                     LOOP
  126.                     CGSortLibArr(InSort_Local_j + 1) = InSort_Local_ArrayTemp
  127.                 END IF
  128.             NEXT
  129.     END SELECT
  130.  
  131. SUB HeapSort (CGSortLibArr() AS DOUBLE, start AS _UNSIGNED LONG, finish AS _UNSIGNED LONG, order AS INTEGER)
  132.     FOR i = start + 1 TO finish
  133.         PercolateUp CGSortLibArr(), start, i, order
  134.     NEXT i
  135.  
  136.     FOR i = finish TO start + 1 STEP -1
  137.         SWAP CGSortLibArr(start), CGSortLibArr(i)
  138.         PercolateDown CGSortLibArr(), start, i - 1, order
  139.     NEXT i
  140.  
  141. SUB PercolateDown (CGSortLibArr() AS DOUBLE, start AS _UNSIGNED LONG, MaxLevel AS _UNSIGNED LONG, order AS INTEGER)
  142.     DIM child AS _UNSIGNED LONG
  143.     DIM ax AS _UNSIGNED LONG
  144.     i = start
  145.     '* Move the value in GetPixel!!!(start) down the heap until it has
  146.     '* reached its proper node (that is, until it is less than its parent
  147.     '* node or until it has reached MaxLevel!!!, the bottom of the current heap):
  148.     DO
  149.         child = 2 * (i - start) + start ' Get the subscript for the Child node.
  150.         '* Reached the bottom of the heap, so exit this procedure:
  151.         IF child > MaxLevel THEN EXIT DO
  152.         SELECT CASE order
  153.             CASE 1
  154.                 '* If there are two Child nodes, find out which one is bigger:
  155.                 ax = child + 1
  156.                 IF ax <= MaxLevel THEN
  157.                     IF CGSortLibArr(ax) > CGSortLibArr(child) THEN
  158.                         child = ax
  159.                     END IF
  160.                 END IF
  161.  
  162.                 '* Move the value down if it is still not bigger than either one of
  163.                 '* its Child!!!ren:
  164.                 IF CGSortLibArr(i) < CGSortLibArr(child) THEN
  165.                     SWAP CGSortLibArr(i), CGSortLibArr(child)
  166.                     i = child
  167.                 ELSE
  168.                     '* Otherwise, CGSortLibArr() has been restored to a heap from start to MaxLevel!!!,
  169.                     '* so exit:
  170.                     EXIT DO
  171.                 END IF
  172.             CASE ELSE
  173.                 '* If there are two Child nodes, find out which one is smaller:
  174.                 ax = child + 1
  175.                 IF ax <= MaxLevel THEN
  176.                     IF CGSortLibArr(ax) < CGSortLibArr(child) THEN
  177.                         child = ax
  178.                     END IF
  179.                 END IF
  180.                 '* Move the value down if it is still not smaller than either one of
  181.                 '* its Child!!!ren:
  182.                 IF CGSortLibArr(i) > CGSortLibArr(child) THEN
  183.                     SWAP CGSortLibArr(i), CGSortLibArr(child)
  184.                     i = child
  185.                 ELSE
  186.                     '* Otherwise, CGSortLibArr() has been restored to a heap from start to MaxLevel!!!,
  187.                     '* so exit:
  188.                     EXIT DO
  189.                 END IF
  190.         END SELECT
  191.     LOOP
  192.  
  193. SUB PercolateUp (CGSortLibArr() AS DOUBLE, start AS _UNSIGNED LONG, MaxLevel AS _UNSIGNED LONG, order AS INTEGER)
  194.     DIM parent AS _UNSIGNED LONG
  195.     SELECT CASE order
  196.         CASE 1
  197.             i = MaxLevel
  198.             '* Move the value in CGSortLibArr(MaxLevel!!!) up the heap until it has
  199.             '* reached its proper node (that is, until it is greater than either
  200.             '* of its Child!!! nodes, or until it has reached 1, the top of the heap):
  201.             DO UNTIL i = start
  202.                 '* Get the subscript for the parent node.
  203.                 parent = start + (i - start) \ 2
  204.                 '* The value at the current node is still bigger than the value at
  205.                 '* its parent node, so swap these two array elements:
  206.                 IF CGSortLibArr(i) > CGSortLibArr(parent) THEN
  207.                     SWAP CGSortLibArr(parent), CGSortLibArr(i)
  208.                     i = parent
  209.                 ELSE
  210.                     '* Otherwise, the element has reached its proper place in the heap,
  211.                     '* so exit this procedure:
  212.                     EXIT DO
  213.                 END IF
  214.             LOOP
  215.         CASE ELSE
  216.             i = MaxLevel
  217.             '* Move the value in CGSortLibArr(MaxLevel!!!) up the heap until it has
  218.             '* reached its proper node (that is, until it is greater than either
  219.             '* of its Child!!! nodes, or until it has reached 1, the top of the heap):
  220.             DO UNTIL i = start
  221.                 '* Get the subscript for the parent node.
  222.                 parent = start + (i - start) \ 2
  223.                 '* The value at the current node is still smaller than the value at
  224.                 '* its parent node, so swap these two array elements:
  225.                 IF CGSortLibArr(i) < CGSortLibArr(parent) THEN
  226.                     SWAP CGSortLibArr(parent), CGSortLibArr(i)
  227.                     i = parent
  228.                 ELSE
  229.                     '* Otherwise, the element has reached its proper place in the heap,
  230.                     '* so exit this procedure:
  231.                     EXIT DO
  232.                 END IF
  233.             LOOP
  234.     END SELECT
  235.  
« Last Edit: October 18, 2018, 04:58:03 am by codeguy »

Offline codeguy

  • Forum Regular
  • Posts: 174
    • View Profile
Re: DIM a (Max Index)
« Reply #32 on: October 18, 2018, 03:43:42 am »
There really is no point changing from long because long before you overflow the range of long numbers, you'll run out of heap space anyway, even storing _BYTE arrays. at the most, 1, MAYBE 2 GB will be in memory while the other 5 or 4 is untouched, awaiting processing. My suggestion? split whatever it is you're sorting into no more than 1 GB files and sort each individually. Then merge those files sequentially. Even optimistically, I wouldn't expect anything greater than (1/2 million BYTE-size elements/second for each GHz of CPU speed) for a generalized comparison sort. You MIGHT be lucky and get 1.5 GB/GHz this way. Yes, there are specialized sorts that can handle FAR better hourly volume, but the information you give us makes determination of a truly suitable algorithm pretty much impossible. Believe me, I know. I am among the most viewed and upvoted in Sorting algorithms on Quora, as well as a Top Writer for 2018. I don't know everything, but I am very well versed in what I do know. Please feel free to add more specifics and clarify what you really need so we can give a more satisfactory answer. I have only discussed POSSIBLE solutions to your problem as described as that is all the information you've given allows.

Offline soniaa

  • Newbie
  • Posts: 18
    • View Profile
Re: DIM a (Max Index)
« Reply #33 on: October 18, 2018, 11:57:10 am »
The numbers I have,
is in thes style;

ARRAY     ( DIMensioned  _INTEGER64 )       -         (NO strigs)  message modificated
q1
q2
q3
....
last is q31

NO q(1) etccc

The value in those ARRAY  are in the range (1 - 8.589.934.590)

In the program these numbers they arrive not in sequence !!!  ....and I do not need to sort.

Every time in the moment I have acquire a number, I need to know if that number has already been processed;
q1 .... q31
 - IF Already processed  THEN  howmanytimesthisnumberalreadyprocessed = howmanytimesthisnumberalreadyprocessed +1
   (howmanytimesthisnumberalreadyprocessed  is in the range  0 - 255  (_UNSIGNED _BYTE)

 - IF NO Already processed  THEN  record this because is an new number
   (IF NO Already processed in this case to increase speed, it is not necessary also assign 1 at howmanytimesthisnumberalreadyprocessed)

« Last Edit: October 18, 2018, 12:10:59 pm by soniaa »

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
Re: DIM a (Max Index)
« Reply #34 on: October 18, 2018, 12:38:17 pm »
Easiest way in this case is a BINARY file:

DIM number AS _INTEGER64
DIM TimesProcessed AS _UNSIGNED _BYTE
OPEN "RecordCount.bin" FOR BINARY AS #1

DO
    INPUT number
    GET #1, number, TimesProcessed
    TimesProcessed = TimesProcessed + 1
    PUT #1, number, TimesProcessed
LOOP UNTIL number = -1 'or some other exit code

The above gets a number, puts it on the hard drive in a file which will be 8,589,934,590 bytes in size, and each byte will tell you how many times that number has "been processed".  (Up to a maximum of 255 times).

***********************************

Now, this method is assuming you have 8 billion records to work with.  If you have a limited dataset (the q31 almost seems to indicate there's only 31 records with values from 0 to 8.5B??), then I'd build an index, store the value and count, and keep it all in memory for speed and efficiency.

But, since it seems like you're going to process billions of records (2 years to run), saving the results in a file seems to me to be the best way to do this.
   
« Last Edit: October 18, 2018, 12:40:55 pm by SMcNeill »
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline soniaa

  • Newbie
  • Posts: 18
    • View Profile
Re: DIM a (Max Index)
« Reply #35 on: October 18, 2018, 12:43:01 pm »
the strategies I tried are the following:

strategy1;
create file.txt  for any number that the program create
into this .txt file write howmanytimesthisnumberalreadyprocessed
To do this I do;
 - IF _FILEEXISTS  (if 0 is an new number , if -1 no new number)
 - OPEN FOR APPEND  (if no exist this procedure create the file)
 - WRITE  1  (append new line in .txt file contain "1"   -  it can also be good in this way)
 - CLOSE

strategy2;
DIM Array(1 to 8589934590) AS _UNSIGNED _BYTE
 - IF Array(mynuber) = 0  THEN  GOTO ......
 - IF Array(mynuber) = >0  THEN  Array(mynuber)  =  Array(mynuber)  +1

strategy3;
OPEN "c.dat" FOR BINARY AS #1
GET #1  ,   mynumber  ,   howmanytimesthisnumberalreadyprocessed

 - IF   howmanytimesthisnumberalreadyprocessed  >  0  THEN     PUT #1  ,  mynumber  ,  howmanytimesthisnumberalreadyprocessed  + 1 :GOTO...

 - PUT #1  ,  mynumber  ,  1

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
Re: DIM a (Max Index)
« Reply #36 on: October 18, 2018, 12:48:50 pm »
Strategy 3 doesn't even require an IF; just an increment since 0 + 1 = 1

Get #1, number, TimesProcessed
TimesProcessed = TimesProcessed + 1
Put #1, number, TimesProcessed

Logically, if the value is 0, it's going to increment to 1 when you do the math.
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline TempodiBasic

  • Forum Resident
  • Posts: 1792
    • View Profile
Re: DIM a (Max Index)
« Reply #37 on: October 18, 2018, 01:44:32 pm »
Hi guys /Ciao ragazzi
Hi Soniaa/Ciao Soniaa
welcome into the bilingual post /benvenuta nel post bilingue

1. it seems that you must calculate frequency of a number in a gigasize array of data ranging as you post
/1. sembra che tu debba calcolare la frequenza di un numero in un array delle dimensioni di GIGA di dati con un range di valori che hai messaggiato

Quote
The numbers I have,
is in thes style;

ARRAY  ( DIMensioned  _INTEGER64 )         -         (NO strigs)  message modificated
q1
q2
q3
....
last is q31

NO q(1) etccc

The value in those ARRAY  are in the range (1 - 8.589.934.590)  ( _INTEGER64 )

Quote
my question;
in the program I'm making, I estimated that the processing process is 2 years.
I have to adopt more strategies to get the maximum speed.

In some phases of the program, I have numbers in the range (1 to 8.589.934.591) (33bit)
NOT all numbers in the range, ONLY 20%.

I have to check if the number I'm processing has already been processed and how many times.

2.about https://www.qb64.org/forum/index.php?topic=705.msg5867#msg5867
 if you are not able to speak ItalianEnglish please help yourself with Google Translator, it is better than nothing! Moreover you can break the language barrier and you can get more from help of great coders like those are answering to you (except me I am coder as hobby)
/2. riguardo a https://www.qb64.org/forum/index.php?topic=705.msg5867#msg5867
se non sai parlare l'IngleseItalianizzato, prego aiutati con Google Translator, è meglio di niente! Inoltre puoi infrangere la barriera del linguaggio e puoi ottenere di più dall'aiuto di grandi programmatori come quelli che ti stanno rispondendo (eccetto me che sono programmatore per hobby)

3. about this
Quote
(howmanytimesthisnumberalreadyprocessed  is in the range  0 - 255  (_UNSIGNED _BYTE)
my questions to understand better (you are talking about processing for at least 2 years, an error should waste much time!):
  3.1 how can you say surely that the frequency cannot be more than 255?
  3.2 must  the numbers that are in the range (1 - 8.589.934.590) and that are not acquired,  be registered as 0 frequency?
/3. circa questo 
Quote
(howmanytimesthisnumberalreadyprocessed  is in the range  0 - 255  (_UNSIGNED _BYTE)
le mie domande per capire meglio (stai parlado di processare per almeno 2 anni, un errore sprecherebbe molto tempo!):
  3.1 come puoi dire sicuramente che la frequenza non può essere maggiore di 255?
  3.2 i numeri che sono nel range  (1 - 8.589.934.590) e che non sono acquisiti devono essere registrati con frequenza 0?

4. about ARRAY
Quote
ARRAY     ( DIMensioned  _INTEGER64 )       -         (NO strigs)  message modificated
q1
q2
q3
....
last is q31

NO q(1) etccc

Quote
Every time in the moment I have acquire a number, I need to know if that number has already been processed;
q1 .... q31
 - IF Already processed  THEN  howmanytimesthisnumberalreadyprocessed = howmanytimesthisnumberalreadyprocessed +1
   (howmanytimesthisnumberalreadyprocessed  is in the range  0 - 255  (_UNSIGNED _BYTE)

 - IF NO Already processed  THEN  record this because is an new number
   (IF NO Already processed in this case to increase speed, it is not necessary also assign 1 at howmanytimesthisnumberalreadyprocessed)

are you sure that this is an array? You talk of array q1 q2 q3  until q31 and not q(1) q(2).... It seems to me that this is a packed field data to manage by UDT
4.1 How can you distinguish among q1 q2 q3 ..q31 of the array in the acquiring phase of data? Or do you acquire a single number of large range?
4.2 How do you know that you can have only q31 number to process and to count the frequency?

/4. riguardo al tuo ARRAY
sei sicura che questo sia un array?  lo definisci composto da q31 elementi da q1 a q31 ma non tipo q(1) q(2)...q(31)
mi sembra che questo sia più un gruppo di dati compressi da gestire tramite UDT.
 4.1 Come puoi distinguere tra q1 q2 q3...q31 dell'array nella fase di acquisizione dei dati? Oppure tu acquisci un singolo numero di ampio range?
4.2 Come fai a sapere che tu puoi avere solo q31 numeri da processare e di cui contare la frequenza?

5. can you be clearer using a pseudocode with large chunks? i.e 
Quote
acquiring a large number ranging from 1 to 8.......  -> record it in a list of processed number -> count how many times it has been processed
/5. puoi essere più chiara usando pseudocodice a grandi blocchi_operazioni? per esempio
Quote
acquisire un grande numero nel range da 1 to 8...... -> registrare il numero in una lista di numeri processati -> contare quante volte esso sia stato processato

Thanks to read
/Grazie per la lettura

Programming isn't difficult, only it's  consuming time and coffee

Offline soniaa

  • Newbie
  • Posts: 18
    • View Profile
Re: DIM a (Max Index)
« Reply #38 on: October 18, 2018, 02:41:28 pm »
Thanks for all reply;
I do not care about ordering the numbers.

in reply to;

SMcNeill;
the IF is necessary fom my because;
 - if GET acquisition >1  THEN  put value +1   AND  goto  inaditection
 - else   PUT  value  1  AND  goto otherdirection

 TempodiBasic;
3.1 -  its max 255
3.2 -  it is not necessary
4.1 -  I have write a program in mode for no use q(1) because using this have lowww performance,  q1 is moooore fasttt
4.2 -  the program loop in 31 step  forward and back billions of times  (brute force)
5.0 -  ?????

Offline soniaa

  • Newbie
  • Posts: 18
    • View Profile
Re: DIM a (Max Index)
« Reply #39 on: October 18, 2018, 02:58:20 pm »
The numbers generated by the array   q1   q2  .... q31    they can be  stored together in an singe  registry,
but in some way they must be associated with  howmanytimesalreadyprocessed (range 1 - 255 )

the registry  it must not have  references  of  the  number  has arrived by q1 or q2...
« Last Edit: October 18, 2018, 02:59:53 pm by soniaa »

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
Re: DIM a (Max Index)
« Reply #40 on: October 18, 2018, 03:10:20 pm »
SMcNeill;
the IF is necessary fom my because;
 - if GET acquisition >1  THEN  put value +1   AND  goto  inaditection
 - else   PUT  value  1  AND  goto otherdirection

The IF action is independent of the GET/ PUT action.

Get #1, number, TimesProcessed
TimesProcessed = TimesProcessed + 1
Put #1, number, TimesProcessed
IF TimesProcessed = 1 THEN GOTO inaditection ELSE GOTO otherdirection
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline soniaa

  • Newbie
  • Posts: 18
    • View Profile
Re: DIM a (Max Index)
« Reply #41 on: October 18, 2018, 03:48:46 pm »
SMcNeill;
the IF is necessary fom my because;
 - if GET acquisition >1  THEN  put value +1   AND  goto  inaditection
 - else   PUT  value  1  AND  goto otherdirection

The IF action is independent of the GET/ PUT action.

Get #1, number, TimesProcessed
TimesProcessed = TimesProcessed + 1
Put #1, number, TimesProcessed
IF TimesProcessed = 1 THEN GOTO inaditection ELSE GOTO otherdirection


****what you wrote  is exactly the same as what I wrote !!??...
...or no?

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
Re: DIM a (Max Index)
« Reply #42 on: October 18, 2018, 04:02:31 pm »
Here's a real world implementation of how I'd handle this problem:

Code: QB64: [Select]
  1. DIM TimesProcessed AS _UNSIGNED _BYTE
  2.  
  3.  
  4. PRINT "Clearing and Opening Multiple files.  This will take a few seconds."
  5. FOR i = 0 TO 8589 'multiple files to hold our values, since we have such a large range for output
  6.     file$ = "FileCounter" + LTRIM$(RTRIM$(STR$(i))) + ".bin"
  7.     OPEN file$ FOR OUTPUT AS #i + 1: CLOSE #i + 1 'blank the file if it exists
  8.     OPEN file$ FOR BINARY AS #i + 1 'open a new file to track values
  9.  
  10. Limit = 100 'The number of values we're going to process (times 31, since you have values from q1 to q33)
  11.  
  12.  
  13. StartTime## = TIMER
  14. FOR i = 1 TO Limit
  15.     FOR j = 1 TO 31 '31 times since the values are for q1 to q31
  16.         q = INT(RND * 8589934590)
  17.         FileNumber = q \ 1000000
  18.         ValueStored = q MOD 1000000
  19.         GET #FileNumber + 1, ValueStored, TimesProcessed
  20.         TimesProcessed = TimesProcessed + 1
  21.         PUT #FileNumber + 1, ValueStored, TimesProcessed
  22.         IF TimesProcessed = 1 THEN PRINT "New Entry", ELSE PRINT "Processed"; TimesProcessed; " already",
  23.     NEXT
  24. CLOSE 'close all files
  25. FinishTime## = TIMER - StartTime##
  26.  
  27. PRINT USING "#####.##### seconds to process 31 numbers, ### times."; FinishTime##, Limit
  28.  
  29. PRINT "Clean Up the Disk?  (Y/N)"
  30.     _LIMIT 30
  31.     i$ = UCASE$(INKEY$)
  32. LOOP UNTIL i$ = "Y" OR i$ = "N"
  33.  
  34. IF i$ = "Y" THEN
  35.     PRINT
  36.     PRINT "Cleaning disk... This too may take several seconds."
  37.     FOR i = 0 TO 8589 'multiple files to hold our values, since we have such a large range for output
  38.         file$ = "FileCounter" + LTRIM$(RTRIM$(STR$(i))) + ".bin"
  39.         KILL file$
  40.     NEXT
  41.  

Now, a few things to note:

1) Notice that I've broken my data down to 1MB files?  This is because it's much faster for the drive to reserve space for a new file that's 1MB in size, than it is to do the same for one that's 8GB in size.  In fact, it's a lot faster to create almost 9000 1MB files, than it is for my drive to create and reserve space for a single 8GB one.  Play around with the values if you want and see how the speed is affected...

2) It still takes me about 40 seconds to process 100 sets of numbers, using this method -- but that's the hit you're going to get for having and using so much file access.  Times will probably be much better on your SSD than it is on my old one here, but it's still doing to be a lot slower than running some sort of counter in memory.

3) You've never told us what the MAXIMUM AMOUNT OF DATA NEEDS TO BE PROCESSED.  If there's less than a million total records, I'd use an index to see what values have appeared, increment them, and keep the whole process in memory.  You've told us that there's 31 values, ranging from 0 to 8.5B, but you haven't mentioned how many times you're going to repeat this process.  If there's 31 total values, there's absolutely no reason to use file access to track number of times a value is processed.  I'd only go the file route, like above, if it was a process which I had to repeat billions of times for those q1 to q31 numbers.  (And how can you be certain that they're only going to occur less than 255 times total, if there's billions of times of them being processed?)

From what you've said, the above is the best way I can think of the deal with the problem you've presented: tracking a counter of a limitless number of passes, of 31 values, which range from 0 to 8GB...  If what you need ISN'T for a limitless number of passes, then give us an idea of the max number of passes which are used with your data.  There's MUCH better ways to deal with the issue, with smaller datasets.
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline soniaa

  • Newbie
  • Posts: 18
    • View Profile
Re: DIM a (Max Index)
« Reply #43 on: October 18, 2018, 07:11:31 pm »
I repeat;
The number I want stored for compare are in the range 0  to 33bit.
No all number in the range the program create, only approximately 20 -30 %.
The program create these number billion of time.
Every when a program create an number I have to know if that number is already processed, then store how many times.
thats is all,
no other.

I think the solutions OPEN c.dat for binary is for the moment the best I try.
If you have other solution muck speed write to me.

Offline TempodiBasic

  • Forum Resident
  • Posts: 1792
    • View Profile
Re: DIM a (Max Index)
« Reply #44 on: October 18, 2018, 07:43:36 pm »
@soniaa

I can't imagine how to structure a program to do what you are talking about!
so I read again your requests in this thread:

the first:
Quote
I have 8.000.000.000 of items,
the value of any ia is max 100
I want create:
Dim q(8000000000) as _byte
BUT THES Produce error
in my TOSHIBA i3 8GB di ram it gives me Out of memory error on compiling

the second:
Quote
I have a goog idea;
Many of my data is value 0
i can no create array for this,
in thes mode the array I want to DIM are approximately 3.000.000.000
Now BIG Problem;
I want put my data into the arrays one by one,
example:
visitornumber1000000000 = 5
visitornumber2000000000 = 7
I do;
DIM a(1000000000 to 1000000000) as _BYTE
a(1000000000) = 5
but why do you create an array made by one single element with index 1000000000?

the third:
Quote
I have test this;
1  REM $DYNAMIC 
2  DIM a(1000000000 to 1000000000) as _BYTE
3  a(1000000000) = 5
4  REDIM _PRESERVE a(2000000000 to 2000000000)
But receive error
I cannot agree.
If I copy and paste this your code in QB64 ide I get compiling no error!
But we must observe that there is a logic error...the array a() as _BYTE on linecode 2 and the array a () on linecode 4 are not the same array, the first is of type _BYTE while the second is of type SINGLE....
moreover also if I add as _BYTE at the end of linecode 4 it runs ok in QB64.

the fourth
Quote
In some phases of the program, I have numbers in the range (1 to 8.589.934.591) (33bit)
NOT all numbers in the range, ONLY 20%.

I have to check if the number I'm processing has already been processed and how many times.
How many numbers? the only 20% of how many? We must process only 20% of numbers because only they are in the range.
We must record what number and how many times has been precessed. But the 20% of infinite is infinite! :-)
How can you be specific about number of number to process?

the fifth
Quote
The numbers I have,
is in thes style;

ARRAY  ( DIMensioned  _INTEGER64 )         -         (NO strigs)  message modificated
q1
q2
q3
....
last is q31

NO q(1) etccc

The value in those ARRAY  are in the range (1 - 8.589.934.590)  ( _INTEGER64 )
Here you decide to use 31 variables _INTEGER64. Why you do not say!

the sixth:
Quote
In the program these numbers they arrive not in sequence !!!  ....and I do not need to sort.

Every time in the moment I have acquire a number, I need to know if that number has already been processed;
q1 .... q31
Again how many numbers we do not know! Why do you use a cluster of 31 variable of _INTEGER64?

the seventh
Quote
strategy1;
create file.txt  for any number that the program create
into this .txt file write howmanytimesthisnumberalreadyprocessed
To do this I do;
 - IF _FILEEXISTS  (if 0 is an new number , if -1 no new number)
 - OPEN FOR APPEND  (if no exist this procedure create the file)
 - WRITE  1  (append new line in .txt file contain "1"   -  it can also be good in this way)
 - CLOSE
IMHO file/disk access is slower than RAM access... also if you can optimize this strategy above showed

the eighth
Quote
strategy2;
DIM Array(1 to 8589934590) AS _UNSIGNED _BYTE
 - IF Array(mynuber) = 0  THEN  GOTO ......
 - IF Array(mynuber) = >0  THEN  Array(mynuber)  =  Array(mynuber)  +1
this strategy can work if you have enough RAM but so you have already solved the issue

the nineth
Quote
strategy3;
OPEN "c.dat" FOR BINARY AS #1
GET #1  ,   mynumber  ,   howmanytimesthisnumberalreadyprocessed

 - IF   howmanytimesthisnumberalreadyprocessed  >  0  THEN     PUT #1  ,  mynumber  ,  howmanytimesthisnumberalreadyprocessed  + 1 :GOTO...

 - PUT #1  ,  mynumber  ,  1
IMHO an access to file/disk in bynary mode should be faster than to txt files but again slower to RAM access.

the tenth
Quote
the program loop in 31 step  forward and back billions of times  (brute force)
Here I ask "Why a loop in 31 steps and not 31 times a loop of a single step as all 31 variable are _INTEGER64?

the eleventh
Quote
The numbers generated by the array   q1   q2  .... q31    they can be  stored together in an singe  registry,
but in some way they must be associated with  howmanytimesalreadyprocessed (range 1 - 255 )
a simple UDT  can do this:
Code: QB64: [Select]
  1. TYPE q31
  2.     HowMany AS _UNSIGNED _BYTE
  3.     q AS _INTEGER64
each variable of this type uses 9 byte of RAM (8 for _INTEGER64 and 1 for _BYTE) using this structure you can save/load from file/disk each variable with one access and the first byte is byself loaded in howmany and the last 8 bytes in q that record the value of number.

the twelveth
Quote
The number I want stored for compare are in the range 0  to 33bit.
No all number in the range the program create, only approximately 20 -30 %.
The program create these number billion of time.
Every when a program create an number I have to know if that number is already processed, then store how many times.
thats is all,

the other 80-70% of numbers generated by program are <1 and > 8.589.934.590
you process only numbers in the range 1--8.589.934.590 and you rate the frequency....
well the 31 variables _INTEGER64 seems only an your choice in projecting the program...
now I can create a right generator of numbers to emulate your issue and show how is my solution!

Perchè non ti aiuti con il traduttore di Google o Reverso, non è il massimo ma almeno costruisci la frase in inglese più correttamente.
Non tutti i membri di questo forum sono inglesi o americani.

Good Coding
Programming isn't difficult, only it's  consuming time and coffee