QB64.org Forum

Active Forums => QB64 Discussion => Topic started by: soniaa on October 15, 2018, 10:58:31 pm

Title: DIM a (Max Index)
Post by: soniaa on October 15, 2018, 10:58:31 pm
Hi, sorry for my english, ...I'm ITA,
I have a problem,
I have 8.000.000.000 of items,
the value of any ia is max 100
I want create:
Dim q(8000000000) as _byte
BUT THES Produce error
the max index accept is 4292870137
Please help me to solve thes problem,
ciao da Soniaa.
Title: Re: DIM a (Max Index)
Post by: soniaa on October 15, 2018, 11:02:07 pm
I use win 7 64 bit
qb64  (64bit version)
my ram is 6GB
Title: Re: DIM a (Max Index)
Post by: bplus on October 15, 2018, 11:17:27 pm
Hi soniaa,

Welcome to the forum!

I doubt this will be very satisfactory, but in the old days we extended memory by using the hard disk for storing stuff until ready to process that part... divide and conquer.

OK, OK, hopefully someone will show up and know more about memory management than I. :)

Title: Re: DIM a (Max Index)
Post by: TerryRitchie on October 15, 2018, 11:22:00 pm
That number equals 4GB so it's more than likely the upper limit for arrays in QB64. I would suggest breaking the array into two pieces if possible, two each at 4,000,000,000 indexes each.

Edit: I just saw you have 6GB in your computer for RAM, so this task will require the hard drive to come into play as Bplus suggested.
Title: Re: DIM a (Max Index)
Post by: bplus on October 15, 2018, 11:29:50 pm
I have 8GB but no error and no answer and no hang for this:
Code: QB64: [Select]
  1. DIM a(4000000000) AS _BYTE
  2. DIM b(4000000000) AS _BYTE
  3. a(4000000000) = 1
  4. b(4000000000) = 1
  5. PRINT a(4000000000) + b(4000000000)
  6.  
Title: Re: DIM a (Max Index)
Post by: codeguy on October 16, 2018, 01:06:49 am
I think all QB64 programs have a max heap size of 2 GB or WinDoze enforces this limit. It is possible to use an _INTEGER64 to create subscripts that extend well beyond the range of 32-bit LongInt. Once past 4 billion or so with _UNSIGNED LONG, it wraps back around 0, while the standard LONG becomes negative and has a limit of about 2 billion before doing so. If you have large indexes, I suggest _INTEGER64 as the index data type.

'* && is _INTEGER64 suffix
SuperBigIndex&& = 2 ^ 37
DIM ac(SuperBigIndex&& TO SuperBigIndex&& + 1048575) AS _INTEGER64

I tried

SuperBigIndex&& = 2 ^ 37
DIM ac(0 TO SuperBigIndex&& + 1048575) AS _INTEGER64

and was not surprised to see an out of memory error.
Title: Re: DIM a (Max Index)
Post by: RhoSigma on October 16, 2018, 02:19:14 am
Quote
I want create:
Dim q(8000000000) as _byte
.
.
.
my ram is 6GB

Should be obvious that it is impossible, even if QB64 would not impose any limits on the array index. You want 8 billion bytes but you only have 6 billion bytes installed in your system.
Title: Re: DIM a (Max Index)
Post by: codeguy on October 16, 2018, 02:32:07 am
You'll most definitely need enough disk space to handle what RAM and QB64 cannot. I would make it a random access file and either segment access to 2 GB using long or use _INTEGER64 variables to specify positions for random access reads. Rho is absolutely correct. There is no way to get 8GB to fit into 6GB and still be usable.
Title: Re: DIM a (Max Index)
Post by: soniaa on October 16, 2018, 07:25:30 am
Thanks for all menbers for yours response,

I think the real motive of my problem is writed in the response of TerryRitchie,
the limit is of qb64 compiler,
not is the my RAM...
...I have do a same test whith only 2GB of RAM and the result is identical:
DIM a(1 TO 4292870136) AS _BYTE  ---- go fine
DIM a(1 TO 4292870140) AS _BYTE  ---- not  go (error chrash when start the .exe)
Title: Re: DIM a (Max Index)
Post by: soniaa on October 16, 2018, 07:49:35 am
I have a goog idea;
Many of my data is value 0
i can no create array for this,
in thes mode the array I want to DIM are approximately 3.000.000.000

Now BIG Problem;
I want put my data into the arrays one by one,
example:
visitornumber1000000000 = 5
visitornumber2000000000 = 7
I do;
DIM a(1000000000 to 1000000000) as _BYTE
a(1000000000) = 5

when I do;
DIM a(2000000000 to 2000000000) as _byte
a(2000000000) = 7
I receive error !!!

I have test this;
REM $DYNAMIC 
DIM a(1000000000 to 1000000000) as _BYTE
a(1000000000) = 5
REDIM _PRESERVE a(2000000000 to 2000000000)
But receive error

HELLPP MEE...
Thanks
Title: Re: DIM a (Max Index)
Post by: FellippeHeitor on October 16, 2018, 09:06:18 am
Your best bet is to keep your data in a file on disk instead of loading it all into memory at once. That'll solve your problem in an instant.
Title: Re: DIM a (Max Index)
Post by: bplus on October 16, 2018, 09:30:43 am
It works but is very slow to setup, after setup speedier access:
Code: QB64: [Select]
  1.  
  2. OPEN "c.dat" FOR BINARY AS #1
  3.  
  4. 'store some numbers in a really big dat file
  5. 'FOR i&& = 5000000000 TO 5000000010
  6. '    b1 = INT(RND * 120)
  7. '    PUT #1, i&&, b1
  8. '    PRINT b1; " ";
  9. 'NEXT
  10.  
  11. GET #1, 5000000000, b1
  12. GET #1, 5000000005, b2
  13. PRINT b1, b2, b1 + b2
  14.  
  15.  
Title: Re: DIM a (Max Index)
Post by: Qwerkey on October 16, 2018, 01:35:56 pm
I agree with bplus (as ever) that writing to disk is the only way and it is, unfortunately, very slow:

Code: QB64: [Select]
  1. Start! = TIMER
  2. OPEN "temp.rnd" FOR RANDOM AS #1 LEN = 1
  3. FIELD 1, 1 AS a$
  4. FOR b~& = 1 TO 30000
  5.     c%% = INT(RND * 100)
  6.     LSET a$ = CHR$(c%%)
  7.     PUT #1, b~&
  8. NEXT b~&
  9. PRINT TIMER - Start!

With just 30000 entries, my Core i5 computer takes 1s to write that many entries.  We can only wish you luck, Sonia, with 8,000,000,000 entries.  You must be wanting an entry for every person on the planet.  [Here in the UK we use commas instead of periods for the thousands separator].
Title: Re: DIM a (Max Index)
Post by: SMcNeill on October 16, 2018, 01:54:19 pm
If the issue is just with the index, and not an actual memory issue, can you read and work with the data as a different variable type?

Instead of:

Dim q(8000000000) as _byte

Try:

Dim q(2000000000) as LONG

Once in memory, it's simple enough to break those longs down to individual bytes, if necessary. 
Title: Re: DIM a (Max Index)
Post by: SMcNeill on October 16, 2018, 01:57:59 pm
I agree with bplus (as ever) that writing to disk is the only way and it is, unfortunately, very slow:

Code: QB64: [Select]
  1. Start! = TIMER
  2. OPEN "temp.rnd" FOR RANDOM AS #1 LEN = 1
  3. FIELD 1, 1 AS a$
  4. FOR b~& = 1 TO 30000
  5.     c%% = INT(RND * 100)
  6.     LSET a$ = CHR$(c%%)
  7.     PUT #1, b~&
  8. NEXT b~&
  9. PRINT TIMER - Start!

With just 30000 entries, my Core i5 computer takes 1s to write that many entries.  We can only wish you luck, Sonia, with 8,000,000,000 entries.  You must be wanting an entry for every person on the planet.  [Here in the UK we use commas instead of periods for the thousands separator].

If your disk access speeds are slow, and are a bottleneck in your program, reduce the number of times you have to access the disk as much as possible. 

Try this:

Code: QB64: [Select]
  1. Start! = TIMER
  2. OPEN "temp.rnd" FOR BINARY AS #1
  3.  
  4. t$ = ""
  5. FOR b~& = 1 TO 30000
  6.     c%% = INT(RND * 100)
  7.     LSET a$ = CHR$(c%%)
  8.     t$ = t$ + LTRIM$(RTRIM$(STR$(b~&)))
  9.     IF LEN(t$) > 1000 THEN PUT #1, , t$: t$ = ""
  10. NEXT b~&
  11. PUT #1, , t$
  12. PRINT TIMER - Start!
  13.  

Instead of writing every value to the disk, we save the results in a temp string (t$) and dump them to the file every time the string is over 1000 bytes in size...  Disk access is reduced considerably, and our access speeds are no longer the bottleneck for performance that they were.
Title: Re: DIM a (Max Index)
Post by: Qwerkey on October 17, 2018, 05:43:42 am
I can confirm (naturally) that bunching the entries together before writing to the disk as described by Steve significantly reduces the write time.
Title: Re: DIM a (Max Index)
Post by: Qwerkey on October 17, 2018, 06:22:30 am
Using the bunching method, writing 8,000,000 entries takes 7s (on my computer).  For 8,000,000,000 entries, the time taken will be 2hrs, which I suppose is not too bad to represent every person on the planet with 1 byte on a home PC.

Code: QB64: [Select]
  1. Start! = TIMER
  2. OPEN "temp.rnd" FOR RANDOM AS #1 LEN = 10000
  3. FIELD 1, 10000 AS a$
  4. FOR b~& = 1 TO 800
  5.     d$ = ""
  6.     FOR e% = 1 TO 10000
  7.         c%% = INT(RND * 100)
  8.         d$ = d$ + CHR$(c%%)
  9.     NEXT e%
  10.     LSET a$ = d$
  11.     PUT #1, b~&
  12. NEXT b~&
  13. GET #1, 2
  14. d$ = RIGHT$(a$, 1)
  15. PRINT d$, ASC(d$)
  16. PRINT TIMER - Start!

Incidentally, if you try to open the file ("temp.rnd" in the case of my code here) with Notepad, you may be surprised to find the file apparently full of non-ASCII characters.  This happens because of the use of ASCII characters below the value of 32 (some of which are non-printable).  Sampling the file and extracting individual elements always gives the expected result.  It is also comforting to note that the file with 8 million 1-byte characters is 8Mb in length.  Richard

Title: Re: DIM a (Max Index)
Post by: SMcNeill on October 17, 2018, 11:52:00 am
Using the bunching method, writing 8,000,000 entries takes 7s (on my computer).  For 8,000,000,000 entries, the time taken will be 2hrs, which I suppose is not too bad to represent every person on the planet with 1 byte on a home PC.

For best performance (which you're going to want with a file that size), first shell out to "fsutil fsinfo ntfsinfo c:" (or whichever drive you're saving to), and read the output of the "Bytes Per Physical Sector" (if it exists).

This will tell you the drives sector size and allow you to minimize disk access calls as much as possible.  Old drives (and older OSes -- before Win 8, I think), primarily used 512 bytes per sector, so that's the size you want to write to disk.  Some newer drives have 4096 bytes per sector, so use that when possible.  (I think all the new drives with 3TB+ storage use 4096 bytes per sector, if you have/use any of those.)

When in doubt, I'd default to a dumpsize of 4096 bytes, as that's a perfect multiple of 512 anyway....

My example used a size of 1000, just out of laziness, but it's not the most efficient.  If 512 is the sector size, we'd write 512 bytes at once on the drive, then only write 488 the second time...  We're still making more access calls than absolutely necessary.

It's a little thing, but done billions of times, it makes a noticeable difference for us.  :)
Title: Re: DIM a (Max Index)
Post by: Qwerkey on October 17, 2018, 12:09:38 pm
Crumbs, Steve!  You know about optimising write speed to hards disks, as well?

Sonia, you will find that there is nothing that Steve doesn't know: he's our resident Man of Wisdom.
Title: Re: DIM a (Max Index)
Post by: Petr on October 17, 2018, 03:47:38 pm
Quote
Is nothing that Steve doesn't know: he's our resident Man of Wisdom.

I absolutely agree.
Title: Re: DIM a (Max Index)
Post by: soniaa on October 17, 2018, 05:40:18 pm
thank you all,
now I'm testing your suggestions,
then I will let you know which ones are most suitable for my needs.
Title: Re: DIM a (Max Index)
Post by: soniaa on October 17, 2018, 05:45:26 pm
my hardware;
Win 7 64bit
10 GB RAM
SSD 240GB for operating system
SSD 120GB for saving my data
Motherboard sata2  AMDphenomII945  3.00GHz  4processor
Title: Re: DIM a (Max Index)
Post by: SMcNeill on October 17, 2018, 05:56:16 pm
If the issue is just with the index, and not an actual memory issue, can you read and work with the data as a different variable type?

Instead of:

Dim q(8000000000) as _byte

Try:

Dim q(2000000000) as LONG

Once in memory, it's simple enough to break those longs down to individual bytes, if necessary.

A slight demo of this:

Code: QB64: [Select]
  1. DIM LongVar AS LONG
  2. DIM ByteVar(4) AS _BYTE
  3.  
  4. LongVar = &H01100620 '4 bytes in hex: 01/10/06/20
  5. ByteVar(1) = _BLUE32(LongVar)
  6. ByteVar(2) = _GREEN32(LongVar)
  7. ByteVar(3) = _RED32(LongVar)
  8. ByteVar(4) = _ALPHA32(LongVar)
  9.  
  10. FOR I = 1 TO 4
  11.    PRINT ByteVar(I)

Your output should be 32, 6, 16, 1.  (Feel free to reverse indexes, if you prefer to read them as 1, 16, 6, 32)

It's basically that simple, with us using _RGBA32 to rebuild the LONG back from our 4 bytes, if we need to write back to the disk.[/code]

****************

Most folks don't think of using the color commands with regular values, but all they do is separate a LONG into bytes, or assemble bytes into a LONG...  There's no reason why we can't make them work for other stuff like this for us.  ;)
Title: Re: DIM a (Max Index)
Post by: soniaa on October 17, 2018, 06:03:49 pm
my question;
in the program I'm making, I estimated that the processing process is 2 years.
I have to adopt more strategies to get the maximum speed.

In some phases of the program, I have numbers in the range (1 to 8.589.934.591) (33bit)
NOT all numbers in the range, ONLY 20%.

I have to check if the number I'm processing has already been processed and how many times.

Title: Re: DIM a (Max Index)
Post by: soniaa on October 17, 2018, 06:13:34 pm
The numbers I have,
is in thes style;

ARRAY  ( DIMensioned  _INTEGER64 )         -         (NO strigs)  message modificated
q1
q2
q3
....
last is q31

NO q(1) etccc

The value in those ARRAY  are in the range (1 - 8.589.934.590)  ( _INTEGER64 )
Title: Re: DIM a (Max Index)
Post by: codeguy on October 17, 2018, 06:38:33 pm
If you know only 20% will be different AND they are relatively random, 1.6 billion 20% of 8 billion. if it's 32-bit values, makes a shade over 6 billion bytes. Really, with a bit of squeezing in a b-tree, you might be able to squeeze that out quickly. TreeSort and use no disk for the sort. Also a HashTable can do the job. or even CountingSort. I'd recommend HashTable as the fastest way for random sets. Even ordered sets, but then why would you ask? But I digress. HashTables provide a fast, convenient way to search for things and correctly constructed, yield an average access of 1.25 seeks/find (or not) and very nearly  0(1) insertion. That too is another doable approach. If you cannot spare space for that and have a brick-wall limit of 6GB, you can run this in segments and mege the results. You could use SteveSort or HashListSort. Both will be pretty quick. HashListSort will even give you a sorted result too.

You're NOT being clear. How can you have single bytes and values well beyond single byte values. My answer remains, but you're jus not being clear at all and yes, I agree my answer is all over the place, but I'm taking into account numerous scenarios, none of which can be used unless you're more specific. FAR more specific.
Title: Re: DIM a (Max Index)
Post by: soniaa on October 17, 2018, 07:30:16 pm
If you know only 20% will be different AND they are relatively random, 1.6 billion 20% of 8 billion. if it's 32-bit values, makes a shade over 6 billion bytes. Really, with a bit of squeezing in a b-tree, you might be able to squeeze that out quickly. TreeSort and use no disk for the sort. Also a HashTable can do the job. or even CountingSort. I'd recommend HashTable as the fastest way for random sets. Even ordered sets, but then why would you ask? But I digress. HashTables provide a fast, convenient way to search for things and correctly constructed, yield an average access of 1.25 seeks/find (or not) and very nearly  0(1) insertion. That too is another doable approach. If you cannot spare space for that and have a brick-wall limit of 6GB, you can run this in segments and mege the results. You could use SteveSort or HashListSort. Both will be pretty quick. HashListSort will even give you a sorted result too.

I,m sorry,,,
I did not understand anything,,,,not a word,,,
Title: Re: DIM a (Max Index)
Post by: Cobalt on October 17, 2018, 07:43:15 pm
If you know only 20% will be different AND they are relatively random, 1.6 billion 20% of 8 billion. if it's 32-bit values, makes a shade over 6 billion bytes. Really, with a bit of squeezing in a b-tree, you might be able to squeeze that out quickly. TreeSort and use no disk for the sort. Also a HashTable can do the job. or even CountingSort. I'd recommend HashTable as the fastest way for random sets. Even ordered sets, but then why would you ask? But I digress. HashTables provide a fast, convenient way to search for things and correctly constructed, yield an average access of 1.25 seeks/find (or not) and very nearly  0(1) insertion. That too is another doable approach. If you cannot spare space for that and have a brick-wall limit of 6GB, you can run this in segments and mege the results. You could use SteveSort or HashListSort. Both will be pretty quick. HashListSort will even give you a sorted result too.

I,m sorry,,,
I did not understand anything,,,,not a word,,,

You are not alone there.  ?? :| ??
Title: Re: DIM a (Max Index)
Post by: bplus on October 17, 2018, 07:55:13 pm
;-)) "What was the question?"
Title: Re: DIM a (Max Index)
Post by: codeguy on October 17, 2018, 09:29:39 pm
Sorry my answer was all over the place. For a reasonable range, IntoSort version or QuickSort using radomized median of 3 sn't bad. IntroSort guarantees a REASONABLE and predictable performance and using the n/2 required extra storage version of MergeSort, you can sort up to 4 billion at a time and not exhaust ram if you use disk space for the temporary array. At most, it will require n/2 of the range being sorted if it even resorts to using that part. Othertwise, for a completely in-place, but not guaranteed stable sort, use IntoSort which uses QuickSort, HeapSort and InsertionSort. Using that approach, you can EASILY just use IntroSort and be guaranteed a reasonable completion time, regardless of the range.
Title: Re: DIM a (Max Index)
Post by: SMcNeill on October 17, 2018, 09:39:10 pm
The numbers I have,
is in thes style;
strigs;
q1
q2
q3
....
q33
NOT q(1) etccc
The value in those strings are in the range 1-255

Can you share an example portion of the datafile/code?  Like Codeguy, I'm lost as to how you can have both byte values and 33-bit values stored sequentially.  How do you know how many bits/bytes to assign to a value?  What delimiters are used?  What structure does this data have?

More details are needed, if we're going to try and sort out a way to help speed things up for you.
Title: Re: DIM a (Max Index)
Post by: codeguy on October 17, 2018, 10:42:49 pm
if you ever get this figured out, here's a decent in-place algorithm to use:
performance (millisecond) profile for DOUBLE precision array until 134M+ elements :
 1 0
 3 0
 7 0
 15 0
 31 0
 63 0
 127 0
 255 0
 511 0
 1023 .9765625
 2047 .9765625
 4095 2.929688
 8191 5.859375
 16383 17.08984
 32767 51.26953
 65535 136.2305
 131071 241.6992
 262143 415.0391
 524287 946.7773
 1048575 1986.816
 2097151 4613.77
 4194303 9541.016
 8388607 18500.98
 16777215 42821.29
 33554431 84074.22
 67108863 186534.7
 134217727 379580.1

demo code:
Code: QB64: [Select]
  1. n& = 0
  2.     n& = n& + 1
  3.     REDIM a(0 TO n&) AS DOUBLE '* 65s
  4.     DIM u AS LONG
  5.     FOR u = LBOUND(a) TO UBOUND(a)
  6.         a(u) = RND
  7.     NEXT
  8.     s! = TIMER(.001)
  9.     QuickSortIntrospective a(), LBOUND(a), UBOUND(a), 1
  10.     f! = TIMER(.001)
  11.     'd$ = "." + STRING$(15, "#")
  12.     DIM l AS LONG
  13.     l = LBOUND(a)
  14.     FOR u = LBOUND(a) TO UBOUND(a)
  15.         IF a(u) < a(l) THEN STOP
  16.         'PRINT USING d$; a(u);
  17.         l = u
  18.     NEXT
  19.     _CLIPBOARD$ = _CLIPBOARD$ + STR$(n&) + STR$(1000 * (f! - s!)) + CHR$(13) + CHR$(10)
  20.     n& = n& * 2
  21.  

Complete IntroSort, in-place (uses no auxiliary) and combines QuickSort, HeapSort and InsertionSort. This will allow the maximum range of in-memory sorting in reasonable time.
Code: QB64: [Select]
  1. '****************************************
  2. '* The IntroSort() algorithm extended to QBxx because there is no qbxx-compatible code
  3. '* The IntroSort algorithm extended to qb64 because i could find no pure qbxx code
  4. '* 03Jun2017, by CodeGuy -- further mods for use in sorting library 03 Aug 2017
  5. '* Introspective Sort (IntroSort) falls back to MergeSort after so many levels of
  6. '* recursion and is good for highly redundant data (aka few unique)
  7. '* for very short runs, it falls back to InsertionSort.
  8.  
  9. SUB QuickSortIntrospective (CGSortLibArr() AS DOUBLE, IntroSort_start AS LONG, IntroSort_finish AS LONG, order AS INTEGER)
  10.     DIM IntroSort_i AS LONG
  11.     DIM IntroSort_J AS LONG
  12.     STATIC IntroSort_level AS INTEGER
  13.     STATIC IntroSort_MaxRecurseLevel AS INTEGER
  14.     IntroSort_MaxRecurseLevel = 15
  15.     IF IntroSort_start < IntroSort_finish THEN
  16.         IF IntroSort_finish - IntroSort_start > 31 THEN
  17.             IF IntroSort_level > IntroSort_MaxRecurseLevel THEN
  18.                 HeapSort CGSortLibArr(), IntroSort_start, IntroSort_finish, order
  19.             ELSE
  20.                 IntroSort_level = IntroSort_level + 1
  21.                 QuickSortIJ CGSortLibArr(), IntroSort_start, IntroSort_finish, IntroSort_i, IntroSort_J, order
  22.                 QuickSortIntrospective CGSortLibArr(), IntroSort_start, IntroSort_J, order
  23.                 QuickSortIntrospective CGSortLibArr(), IntroSort_i, IntroSort_finish, order
  24.                 IntroSort_level = IntroSort_level - 1
  25.             END IF
  26.         ELSE
  27.             InsertionSort CGSortLibArr(), IntroSort_start, IntroSort_finish, order
  28.         END IF
  29.     END IF
  30.  
  31. SUB QuickSortIJ (CGSortLibArr() AS DOUBLE, start AS LONG, finish AS LONG, i AS LONG, j AS LONG, order AS INTEGER)
  32.     DIM Compare AS DOUBLE '* MUST be the same type as CGSortLibArr()
  33.     i = start
  34.     j = finish
  35.     Compare = CGSortLibArr(i + (j - i) \ 2)
  36.     SELECT CASE order
  37.         CASE 1
  38.             DO
  39.                 DO WHILE CGSortLibArr(i) < Compare
  40.                     i = i + 1
  41.                 LOOP
  42.                 DO WHILE CGSortLibArr(j) > Compare
  43.                     j = j - 1
  44.                 LOOP
  45.                 IF i <= j THEN
  46.                     IF i <> j THEN
  47.                         SWAP CGSortLibArr(i), CGSortLibArr(j)
  48.                     END IF
  49.                     i = i + 1
  50.                     j = j - 1
  51.                 END IF
  52.             LOOP UNTIL i > j
  53.         CASE ELSE
  54.             DO
  55.                 DO WHILE CGSortLibArr(i) > Compare
  56.                     i = i + 1
  57.                 LOOP
  58.                 DO WHILE CGSortLibArr(j) < Compare
  59.                     j = j - 1
  60.                 LOOP
  61.                 IF i <= j THEN
  62.                     IF i <> j THEN
  63.                         SWAP CGSortLibArr(i), CGSortLibArr(j)
  64.                     END IF
  65.                     i = i + 1
  66.                     j = j - 1
  67.                 END IF
  68.             LOOP UNTIL i > j
  69.     END SELECT
  70.  
  71. SUB InsertionSort (CGSortLibArr() AS DOUBLE, start AS LONG, finish AS LONG, order AS INTEGER)
  72.     DIM InSort_Local_ArrayTemp AS DOUBLE
  73.     DIM InSort_Local_i AS LONG
  74.     DIM InSort_Local_j AS LONG
  75.     SELECT CASE order
  76.         CASE 1
  77.             FOR InSort_Local_i = start + 1 TO finish
  78.                 InSort_Local_j = InSort_Local_i - 1
  79.                 IF CGSortLibArr(InSort_Local_i) < CGSortLibArr(InSort_Local_j) THEN
  80.                     InSort_Local_ArrayTemp = CGSortLibArr(InSort_Local_i)
  81.                     DO UNTIL InSort_Local_j < start
  82.                         IF (InSort_Local_ArrayTemp < CGSortLibArr(InSort_Local_j)) THEN
  83.                             CGSortLibArr(InSort_Local_j + 1) = CGSortLibArr(InSort_Local_j)
  84.                             InSort_Local_j = InSort_Local_j - 1
  85.                         ELSE
  86.                             EXIT DO
  87.                         END IF
  88.                     LOOP
  89.                     CGSortLibArr(InSort_Local_j + 1) = InSort_Local_ArrayTemp
  90.                 END IF
  91.             NEXT
  92.         CASE ELSE
  93.             FOR InSort_Local_i = start + 1 TO finish
  94.                 InSort_Local_j = InSort_Local_i - 1
  95.                 IF CGSortLibArr(InSort_Local_i) > CGSortLibArr(InSort_Local_j) THEN
  96.                     InSort_Local_ArrayTemp = CGSortLibArr(InSort_Local_i)
  97.                     DO UNTIL InSort_Local_j < start
  98.                         IF (InSort_Local_ArrayTemp > CGSortLibArr(InSort_Local_j)) THEN
  99.                             CGSortLibArr(InSort_Local_j + 1) = CGSortLibArr(InSort_Local_j)
  100.                             InSort_Local_j = InSort_Local_j - 1
  101.                         ELSE
  102.                             EXIT DO
  103.                         END IF
  104.                     LOOP
  105.                     CGSortLibArr(InSort_Local_j + 1) = InSort_Local_ArrayTemp
  106.                 END IF
  107.             NEXT
  108.     END SELECT
  109.  
  110. SUB HeapSort (CGSortLibArr() AS DOUBLE, start AS LONG, finish AS LONG, order AS INTEGER)
  111.     DIM i AS LONG
  112.     DIM j AS LONG
  113.     FOR i = start + 1 TO finish
  114.         PercolateUp CGSortLibArr(), start, i, order
  115.     NEXT i
  116.  
  117.     FOR i = finish TO start + 1 STEP -1
  118.         SWAP CGSortLibArr(start), CGSortLibArr(i)
  119.         PercolateDown CGSortLibArr(), start, i - 1, order
  120.     NEXT i
  121.  
  122. SUB PercolateDown (CGSortLibArr() AS DOUBLE, start AS LONG, MaxLevel AS LONG, order AS INTEGER)
  123.     DIM i AS LONG
  124.     DIM child AS LONG
  125.     DIM ax AS LONG
  126.     i = start
  127.     '* Move the value in GetPixel!!!(start) down the heap until it has
  128.     '* reached its proper node (that is, until it is less than its parent
  129.     '* node or until it has reached MaxLevel!!!, the bottom of the current heap):
  130.     DO
  131.         child = 2 * (i - start) + start ' Get the subscript for the Child node.
  132.         '* Reached the bottom of the heap, so exit this procedure:
  133.         IF child > MaxLevel THEN EXIT DO
  134.         SELECT CASE order
  135.             CASE 1
  136.                 '* If there are two Child nodes, find out which one is bigger:
  137.                 ax = child + 1
  138.                 IF ax <= MaxLevel THEN
  139.                     IF CGSortLibArr(ax) > CGSortLibArr(child) THEN
  140.                         child = ax
  141.                     END IF
  142.                 END IF
  143.  
  144.                 '* Move the value down if it is still not bigger than either one of
  145.                 '* its Child!!!ren:
  146.                 IF CGSortLibArr(i) < CGSortLibArr(child) THEN
  147.                     SWAP CGSortLibArr(i), CGSortLibArr(child)
  148.                     i = child
  149.                 ELSE
  150.                     '* Otherwise, CGSortLibArr() has been restored to a heap from start to MaxLevel!!!,
  151.                     '* so exit:
  152.                     EXIT DO
  153.                 END IF
  154.             CASE ELSE
  155.                 '* If there are two Child nodes, find out which one is smaller:
  156.                 ax = child + 1
  157.                 IF ax <= MaxLevel THEN
  158.                     IF CGSortLibArr(ax) < CGSortLibArr(child) THEN
  159.                         child = ax
  160.                     END IF
  161.                 END IF
  162.                 '* Move the value down if it is still not smaller than either one of
  163.                 '* its Child!!!ren:
  164.                 IF CGSortLibArr(i) > CGSortLibArr(child) THEN
  165.                     SWAP CGSortLibArr(i), CGSortLibArr(child)
  166.                     i = child
  167.                 ELSE
  168.                     '* Otherwise, CGSortLibArr() has been restored to a heap from start to MaxLevel!!!,
  169.                     '* so exit:
  170.                     EXIT DO
  171.                 END IF
  172.         END SELECT
  173.     LOOP
  174.  
  175. SUB PercolateUp (CGSortLibArr() AS DOUBLE, start AS LONG, MaxLevel AS LONG, order AS INTEGER)
  176.     DIM parent AS LONG
  177.     DIM i AS LONG
  178.     SELECT CASE order
  179.         CASE 1
  180.             i = MaxLevel
  181.             '* Move the value in CGSortLibArr(MaxLevel!!!) up the heap until it has
  182.             '* reached its proper node (that is, until it is greater than either
  183.             '* of its Child!!! nodes, or until it has reached 1, the top of the heap):
  184.             DO UNTIL i = start
  185.                 '* Get the subscript for the parent node.
  186.                 parent = start + (i - start) \ 2
  187.                 '* The value at the current node is still bigger than the value at
  188.                 '* its parent node, so swap these two array elements:
  189.                 IF CGSortLibArr(i) > CGSortLibArr(parent) THEN
  190.                     SWAP CGSortLibArr(parent), CGSortLibArr(i)
  191.                     i = parent
  192.                 ELSE
  193.                     '* Otherwise, the element has reached its proper place in the heap,
  194.                     '* so exit this procedure:
  195.                     EXIT DO
  196.                 END IF
  197.             LOOP
  198.         CASE ELSE
  199.             i = MaxLevel
  200.             '* Move the value in CGSortLibArr(MaxLevel!!!) up the heap until it has
  201.             '* reached its proper node (that is, until it is greater than either
  202.             '* of its Child!!! nodes, or until it has reached 1, the top of the heap):
  203.             DO UNTIL i = start
  204.                 '* Get the subscript for the parent node.
  205.                 parent = start + (i - start) \ 2
  206.                 '* The value at the current node is still smaller than the value at
  207.                 '* its parent node, so swap these two array elements:
  208.                 IF CGSortLibArr(i) < CGSortLibArr(parent) THEN
  209.                     SWAP CGSortLibArr(parent), CGSortLibArr(i)
  210.                     i = parent
  211.                 ELSE
  212.                     '* Otherwise, the element has reached its proper place in the heap,
  213.                     '* so exit this procedure:
  214.                     EXIT DO
  215.                 END IF
  216.             LOOP
  217.     END SELECT
  218.  
IntroSort, completely in-place. Don't forget to change AS LONG to as _INTEGER64 to avoid overflow problems on indexes and change types as appropriate.


code to use _UNSIGNED LONG for indexes into arrays
Code: QB64: [Select]
  1. testn = 0
  2. '_CLIPBOARD$ = ""
  3.     testn = testn + 1
  4.     REDIM a(0 TO testn) AS DOUBLE '* 65s
  5.     FOR u = LBOUND(a) TO UBOUND(a)
  6.         a(u) = RND
  7.     NEXT
  8.     s! = TIMER(.001)
  9.     QuickSortIntrospective a(), LBOUND(a), UBOUND(a), 1
  10.     f! = TIMER(.001)
  11.     'd$ = "." + STRING$(15, "#")
  12.     DIM lp AS _UNSIGNED LONG
  13.     lp = LBOUND(a)
  14.     FOR u = LBOUND(a) TO UBOUND(a)
  15.         IF a(u) < a(lp) THEN STOP
  16.         'PRINT USING d$; a(u);
  17.         lp = u
  18.     NEXT
  19.     ''_CLIPBOARD$ = _CLIPBOARD$ + STR$(testn) + STR$(1000 * (f! - s!)) + CHR$(13) + CHR$(10)
  20.     PRINT STR$(testn); STR$(1000 * (f! - s!))
  21.     testn = testn * 2
  22.  
  23. '****************************************
  24. '* The IntroSort() algorithm extended to QBxx because there is no qbxx-compatible code
  25. '* The IntroSort algorithm extended to qb64 because i could find no pure qbxx code
  26. '* 03Jun2017, by CodeGuy -- further mods for use in sorting library 03 Aug 2017
  27. '* Introspective Sort (IntroSort) falls back to MergeSort after so many levels of
  28. '* recursion and is good for highly redundant data (aka few unique)
  29. '* for very short runs, it falls back to InsertionSort.
  30.  
  31. SUB QuickSortIntrospective (CGSortLibArr() AS DOUBLE, IntroSort_start AS _UNSIGNED LONG, IntroSort_finish AS _UNSIGNED LONG, order AS INTEGER)
  32.     DIM IntroSort_i AS _UNSIGNED LONG
  33.     DIM IntroSort_J AS _UNSIGNED LONG
  34.     STATIC IntroSort_level AS INTEGER
  35.     STATIC IntroSort_MaxRecurseLevel AS INTEGER
  36.     IntroSort_MaxRecurseLevel = 15
  37.     IF IntroSort_start < IntroSort_finish THEN
  38.         IF IntroSort_finish - IntroSort_start > 31 THEN
  39.             IF IntroSort_level > IntroSort_MaxRecurseLevel THEN
  40.                 HeapSort CGSortLibArr(), IntroSort_start, IntroSort_finish, order
  41.             ELSE
  42.                 IntroSort_level = IntroSort_level + 1
  43.                 QuickSortIJ CGSortLibArr(), IntroSort_start, IntroSort_finish, IntroSort_i, IntroSort_J, order
  44.                 QuickSortIntrospective CGSortLibArr(), IntroSort_start, IntroSort_J, order
  45.                 QuickSortIntrospective CGSortLibArr(), IntroSort_i, IntroSort_finish, order
  46.                 IntroSort_level = IntroSort_level - 1
  47.             END IF
  48.         ELSE
  49.             InsertionSort CGSortLibArr(), IntroSort_start, IntroSort_finish, order
  50.         END IF
  51.     END IF
  52.  
  53. SUB QuickSortIJ (CGSortLibArr() AS DOUBLE, start AS _UNSIGNED LONG, finish AS _UNSIGNED LONG, i AS _UNSIGNED LONG, j AS _UNSIGNED LONG, order AS INTEGER)
  54.     DIM Compare AS DOUBLE '* MUST be the same type as CGSortLibArr()
  55.     i = start
  56.     j = finish
  57.     Compare = CGSortLibArr(i + (j - i) \ 2)
  58.     SELECT CASE order
  59.         CASE 1
  60.             DO
  61.                 DO WHILE CGSortLibArr(i) < Compare
  62.                     i = i + 1
  63.                 LOOP
  64.                 DO WHILE CGSortLibArr(j) > Compare
  65.                     j = j - 1
  66.                 LOOP
  67.                 IF i <= j THEN
  68.                     IF i <> j THEN
  69.                         SWAP CGSortLibArr(i), CGSortLibArr(j)
  70.                     END IF
  71.                     i = i + 1
  72.                     j = j - 1
  73.                 END IF
  74.             LOOP UNTIL i > j
  75.         CASE ELSE
  76.             DO
  77.                 DO WHILE CGSortLibArr(i) > Compare
  78.                     i = i + 1
  79.                 LOOP
  80.                 DO WHILE CGSortLibArr(j) < Compare
  81.                     j = j - 1
  82.                 LOOP
  83.                 IF i <= j THEN
  84.                     IF i <> j THEN
  85.                         SWAP CGSortLibArr(i), CGSortLibArr(j)
  86.                     END IF
  87.                     i = i + 1
  88.                     j = j - 1
  89.                 END IF
  90.             LOOP UNTIL i > j
  91.     END SELECT
  92.  
  93. SUB InsertionSort (CGSortLibArr() AS DOUBLE, start AS _UNSIGNED LONG, finish AS _UNSIGNED LONG, order AS INTEGER)
  94.     DIM InSort_Local_ArrayTemp AS DOUBLE
  95.     DIM InSort_Local_i AS _UNSIGNED LONG
  96.     DIM InSort_Local_j AS _UNSIGNED LONG
  97.     SELECT CASE order
  98.         CASE 1
  99.             FOR InSort_Local_i = start + 1 TO finish
  100.                 InSort_Local_j = InSort_Local_i - 1
  101.                 IF CGSortLibArr(InSort_Local_i) < CGSortLibArr(InSort_Local_j) THEN
  102.                     InSort_Local_ArrayTemp = CGSortLibArr(InSort_Local_i)
  103.                     DO WHILE InSort_Local_j + 1 > start
  104.                         IF (InSort_Local_ArrayTemp < CGSortLibArr(InSort_Local_j)) THEN
  105.                             CGSortLibArr(InSort_Local_j + 1) = CGSortLibArr(InSort_Local_j)
  106.                             InSort_Local_j = InSort_Local_j - 1
  107.                         ELSE
  108.                             EXIT DO
  109.                         END IF
  110.                     LOOP
  111.                     CGSortLibArr(InSort_Local_j + 1) = InSort_Local_ArrayTemp
  112.                 END IF
  113.             NEXT
  114.         CASE ELSE
  115.             FOR InSort_Local_i = start + 1 TO finish
  116.                 InSort_Local_j = InSort_Local_i - 1
  117.                 IF CGSortLibArr(InSort_Local_i) > CGSortLibArr(InSort_Local_j) THEN
  118.                     InSort_Local_ArrayTemp = CGSortLibArr(InSort_Local_i)
  119.                     DO WHILE InSort_Local_j + 1 > start
  120.                         IF (InSort_Local_ArrayTemp > CGSortLibArr(InSort_Local_j)) THEN
  121.                             CGSortLibArr(InSort_Local_j + 1) = CGSortLibArr(InSort_Local_j)
  122.                             InSort_Local_j = InSort_Local_j - 1
  123.                         ELSE
  124.                             EXIT DO
  125.                         END IF
  126.                     LOOP
  127.                     CGSortLibArr(InSort_Local_j + 1) = InSort_Local_ArrayTemp
  128.                 END IF
  129.             NEXT
  130.     END SELECT
  131.  
  132. SUB HeapSort (CGSortLibArr() AS DOUBLE, start AS _UNSIGNED LONG, finish AS _UNSIGNED LONG, order AS INTEGER)
  133.     FOR i = start + 1 TO finish
  134.         PercolateUp CGSortLibArr(), start, i, order
  135.     NEXT i
  136.  
  137.     FOR i = finish TO start + 1 STEP -1
  138.         SWAP CGSortLibArr(start), CGSortLibArr(i)
  139.         PercolateDown CGSortLibArr(), start, i - 1, order
  140.     NEXT i
  141.  
  142. SUB PercolateDown (CGSortLibArr() AS DOUBLE, start AS _UNSIGNED LONG, MaxLevel AS _UNSIGNED LONG, order AS INTEGER)
  143.     DIM child AS _UNSIGNED LONG
  144.     DIM ax AS _UNSIGNED LONG
  145.     i = start
  146.     '* Move the value in GetPixel!!!(start) down the heap until it has
  147.     '* reached its proper node (that is, until it is less than its parent
  148.     '* node or until it has reached MaxLevel!!!, the bottom of the current heap):
  149.     DO
  150.         child = 2 * (i - start) + start ' Get the subscript for the Child node.
  151.         '* Reached the bottom of the heap, so exit this procedure:
  152.         IF child > MaxLevel THEN EXIT DO
  153.         SELECT CASE order
  154.             CASE 1
  155.                 '* If there are two Child nodes, find out which one is bigger:
  156.                 ax = child + 1
  157.                 IF ax <= MaxLevel THEN
  158.                     IF CGSortLibArr(ax) > CGSortLibArr(child) THEN
  159.                         child = ax
  160.                     END IF
  161.                 END IF
  162.  
  163.                 '* Move the value down if it is still not bigger than either one of
  164.                 '* its Child!!!ren:
  165.                 IF CGSortLibArr(i) < CGSortLibArr(child) THEN
  166.                     SWAP CGSortLibArr(i), CGSortLibArr(child)
  167.                     i = child
  168.                 ELSE
  169.                     '* Otherwise, CGSortLibArr() has been restored to a heap from start to MaxLevel!!!,
  170.                     '* so exit:
  171.                     EXIT DO
  172.                 END IF
  173.             CASE ELSE
  174.                 '* If there are two Child nodes, find out which one is smaller:
  175.                 ax = child + 1
  176.                 IF ax <= MaxLevel THEN
  177.                     IF CGSortLibArr(ax) < CGSortLibArr(child) THEN
  178.                         child = ax
  179.                     END IF
  180.                 END IF
  181.                 '* Move the value down if it is still not smaller than either one of
  182.                 '* its Child!!!ren:
  183.                 IF CGSortLibArr(i) > CGSortLibArr(child) THEN
  184.                     SWAP CGSortLibArr(i), CGSortLibArr(child)
  185.                     i = child
  186.                 ELSE
  187.                     '* Otherwise, CGSortLibArr() has been restored to a heap from start to MaxLevel!!!,
  188.                     '* so exit:
  189.                     EXIT DO
  190.                 END IF
  191.         END SELECT
  192.     LOOP
  193.  
  194. SUB PercolateUp (CGSortLibArr() AS DOUBLE, start AS _UNSIGNED LONG, MaxLevel AS _UNSIGNED LONG, order AS INTEGER)
  195.     DIM parent AS _UNSIGNED LONG
  196.     SELECT CASE order
  197.         CASE 1
  198.             i = MaxLevel
  199.             '* Move the value in CGSortLibArr(MaxLevel!!!) up the heap until it has
  200.             '* reached its proper node (that is, until it is greater than either
  201.             '* of its Child!!! nodes, or until it has reached 1, the top of the heap):
  202.             DO UNTIL i = start
  203.                 '* Get the subscript for the parent node.
  204.                 parent = start + (i - start) \ 2
  205.                 '* The value at the current node is still bigger than the value at
  206.                 '* its parent node, so swap these two array elements:
  207.                 IF CGSortLibArr(i) > CGSortLibArr(parent) THEN
  208.                     SWAP CGSortLibArr(parent), CGSortLibArr(i)
  209.                     i = parent
  210.                 ELSE
  211.                     '* Otherwise, the element has reached its proper place in the heap,
  212.                     '* so exit this procedure:
  213.                     EXIT DO
  214.                 END IF
  215.             LOOP
  216.         CASE ELSE
  217.             i = MaxLevel
  218.             '* Move the value in CGSortLibArr(MaxLevel!!!) up the heap until it has
  219.             '* reached its proper node (that is, until it is greater than either
  220.             '* of its Child!!! nodes, or until it has reached 1, the top of the heap):
  221.             DO UNTIL i = start
  222.                 '* Get the subscript for the parent node.
  223.                 parent = start + (i - start) \ 2
  224.                 '* The value at the current node is still smaller than the value at
  225.                 '* its parent node, so swap these two array elements:
  226.                 IF CGSortLibArr(i) < CGSortLibArr(parent) THEN
  227.                     SWAP CGSortLibArr(parent), CGSortLibArr(i)
  228.                     i = parent
  229.                 ELSE
  230.                     '* Otherwise, the element has reached its proper place in the heap,
  231.                     '* so exit this procedure:
  232.                     EXIT DO
  233.                 END IF
  234.             LOOP
  235.     END SELECT
  236.  

InsertionSort was not very friendly to _UNSIGNED until modified this way:

Code: QB64: [Select]
  1. testn = 0
  2.     testn = testn + 1
  3.     REDIM a(0 TO testn) AS DOUBLE '* 65s
  4.     FOR u = LBOUND(a) TO UBOUND(a)
  5.         a(u) = RND
  6.     NEXT
  7.     s! = TIMER(.001)
  8.     QuickSortIntrospective a(), LBOUND(a), UBOUND(a), 1
  9.     f! = TIMER(.001)
  10.     'd$ = "." + STRING$(15, "#")
  11.     DIM lp AS _UNSIGNED LONG
  12.     lp = LBOUND(a)
  13.     FOR u = LBOUND(a) TO UBOUND(a)
  14.         IF a(u) < a(lp) THEN STOP
  15.         'PRINT USING d$; a(u);
  16.         lp = u
  17.     NEXT
  18.     _CLIPBOARD$ = _CLIPBOARD$ + STR$(testn) + STR$(1000 * (f! - s!)) + CHR$(13) + CHR$(10)
  19.     testn = testn * 2
  20.  
  21.  
  22. '****************************************
  23. '* The IntroSort() algorithm extended to QBxx because there is no qbxx-compatible code
  24. '* The IntroSort algorithm extended to qb64 because i could find no pure qbxx code
  25. '* 03Jun2017, by CodeGuy -- further mods for use in sorting library 03 Aug 2017
  26. '* Introspective Sort (IntroSort) falls back to MergeSort after so many levels of
  27. '* recursion and is good for highly redundant data (aka few unique)
  28. '* for very short runs, it falls back to InsertionSort.
  29.  
  30. SUB QuickSortIntrospective (CGSortLibArr() AS DOUBLE, IntroSort_start AS _UNSIGNED LONG, IntroSort_finish AS _UNSIGNED LONG, order AS INTEGER)
  31.     DIM IntroSort_i AS _UNSIGNED LONG
  32.     DIM IntroSort_J AS _UNSIGNED LONG
  33.     STATIC IntroSort_level AS INTEGER
  34.     STATIC IntroSort_MaxRecurseLevel AS INTEGER
  35.     IntroSort_MaxRecurseLevel = 15
  36.     IF IntroSort_start < IntroSort_finish THEN
  37.         IF IntroSort_finish - IntroSort_start > 31 THEN
  38.             IF IntroSort_level > IntroSort_MaxRecurseLevel THEN
  39.                 HeapSort CGSortLibArr(), IntroSort_start, IntroSort_finish, order
  40.             ELSE
  41.                 IntroSort_level = IntroSort_level + 1
  42.                 QuickSortIJ CGSortLibArr(), IntroSort_start, IntroSort_finish, IntroSort_i, IntroSort_J, order
  43.                 QuickSortIntrospective CGSortLibArr(), IntroSort_start, IntroSort_J, order
  44.                 QuickSortIntrospective CGSortLibArr(), IntroSort_i, IntroSort_finish, order
  45.                 IntroSort_level = IntroSort_level - 1
  46.             END IF
  47.         ELSE
  48.             InsertionSort CGSortLibArr(), IntroSort_start, IntroSort_finish, order
  49.         END IF
  50.     END IF
  51.  
  52. SUB QuickSortIJ (CGSortLibArr() AS DOUBLE, start AS _UNSIGNED LONG, finish AS _UNSIGNED LONG, i AS _UNSIGNED LONG, j AS _UNSIGNED LONG, order AS INTEGER)
  53.     DIM Compare AS DOUBLE '* MUST be the same type as CGSortLibArr()
  54.     i = start
  55.     j = finish
  56.     Compare = CGSortLibArr(i + (j - i) \ 2)
  57.     SELECT CASE order
  58.         CASE 1
  59.             DO
  60.                 DO WHILE CGSortLibArr(i) < Compare
  61.                     i = i + 1
  62.                 LOOP
  63.                 DO WHILE CGSortLibArr(j) > Compare
  64.                     j = j - 1
  65.                 LOOP
  66.                 IF i <= j THEN
  67.                     IF i <> j THEN
  68.                         SWAP CGSortLibArr(i), CGSortLibArr(j)
  69.                     END IF
  70.                     i = i + 1
  71.                     j = j - 1
  72.                 END IF
  73.             LOOP UNTIL i > j
  74.         CASE ELSE
  75.             DO
  76.                 DO WHILE CGSortLibArr(i) > Compare
  77.                     i = i + 1
  78.                 LOOP
  79.                 DO WHILE CGSortLibArr(j) < Compare
  80.                     j = j - 1
  81.                 LOOP
  82.                 IF i <= j THEN
  83.                     IF i <> j THEN
  84.                         SWAP CGSortLibArr(i), CGSortLibArr(j)
  85.                     END IF
  86.                     i = i + 1
  87.                     j = j - 1
  88.                 END IF
  89.             LOOP UNTIL i > j
  90.     END SELECT
  91.  
  92. SUB InsertionSort (CGSortLibArr() AS DOUBLE, start AS _UNSIGNED LONG, finish AS _UNSIGNED LONG, order AS INTEGER)
  93.     DIM InSort_Local_ArrayTemp AS DOUBLE
  94.     DIM InSort_Local_i AS _UNSIGNED LONG
  95.     DIM InSort_Local_j AS _UNSIGNED LONG
  96.     SELECT CASE order
  97.         CASE 1
  98.             FOR InSort_Local_i = start + 1 TO finish
  99.                 InSort_Local_j = InSort_Local_i - 1
  100.                 IF CGSortLibArr(InSort_Local_i) < CGSortLibArr(InSort_Local_j) THEN
  101.                     InSort_Local_ArrayTemp = CGSortLibArr(InSort_Local_i)
  102.                     DO WHILE InSort_Local_j + 1 > start
  103.                         IF (InSort_Local_ArrayTemp < CGSortLibArr(InSort_Local_j)) THEN
  104.                             CGSortLibArr(InSort_Local_j + 1) = CGSortLibArr(InSort_Local_j)
  105.                             InSort_Local_j = InSort_Local_j - 1
  106.                         ELSE
  107.                             EXIT DO
  108.                         END IF
  109.                     LOOP
  110.                     CGSortLibArr(InSort_Local_j + 1) = InSort_Local_ArrayTemp
  111.                 END IF
  112.             NEXT
  113.         CASE ELSE
  114.             FOR InSort_Local_i = start + 1 TO finish
  115.                 InSort_Local_j = InSort_Local_i - 1
  116.                 IF CGSortLibArr(InSort_Local_i) > CGSortLibArr(InSort_Local_j) THEN
  117.                     InSort_Local_ArrayTemp = CGSortLibArr(InSort_Local_i)
  118.                     DO WHILE InSort_Local_j + 1 > start
  119.                         IF (InSort_Local_ArrayTemp > CGSortLibArr(InSort_Local_j)) THEN
  120.                             CGSortLibArr(InSort_Local_j + 1) = CGSortLibArr(InSort_Local_j)
  121.                             InSort_Local_j = InSort_Local_j - 1
  122.                         ELSE
  123.                             EXIT DO
  124.                         END IF
  125.                     LOOP
  126.                     CGSortLibArr(InSort_Local_j + 1) = InSort_Local_ArrayTemp
  127.                 END IF
  128.             NEXT
  129.     END SELECT
  130.  
  131. SUB HeapSort (CGSortLibArr() AS DOUBLE, start AS _UNSIGNED LONG, finish AS _UNSIGNED LONG, order AS INTEGER)
  132.     FOR i = start + 1 TO finish
  133.         PercolateUp CGSortLibArr(), start, i, order
  134.     NEXT i
  135.  
  136.     FOR i = finish TO start + 1 STEP -1
  137.         SWAP CGSortLibArr(start), CGSortLibArr(i)
  138.         PercolateDown CGSortLibArr(), start, i - 1, order
  139.     NEXT i
  140.  
  141. SUB PercolateDown (CGSortLibArr() AS DOUBLE, start AS _UNSIGNED LONG, MaxLevel AS _UNSIGNED LONG, order AS INTEGER)
  142.     DIM child AS _UNSIGNED LONG
  143.     DIM ax AS _UNSIGNED LONG
  144.     i = start
  145.     '* Move the value in GetPixel!!!(start) down the heap until it has
  146.     '* reached its proper node (that is, until it is less than its parent
  147.     '* node or until it has reached MaxLevel!!!, the bottom of the current heap):
  148.     DO
  149.         child = 2 * (i - start) + start ' Get the subscript for the Child node.
  150.         '* Reached the bottom of the heap, so exit this procedure:
  151.         IF child > MaxLevel THEN EXIT DO
  152.         SELECT CASE order
  153.             CASE 1
  154.                 '* If there are two Child nodes, find out which one is bigger:
  155.                 ax = child + 1
  156.                 IF ax <= MaxLevel THEN
  157.                     IF CGSortLibArr(ax) > CGSortLibArr(child) THEN
  158.                         child = ax
  159.                     END IF
  160.                 END IF
  161.  
  162.                 '* Move the value down if it is still not bigger than either one of
  163.                 '* its Child!!!ren:
  164.                 IF CGSortLibArr(i) < CGSortLibArr(child) THEN
  165.                     SWAP CGSortLibArr(i), CGSortLibArr(child)
  166.                     i = child
  167.                 ELSE
  168.                     '* Otherwise, CGSortLibArr() has been restored to a heap from start to MaxLevel!!!,
  169.                     '* so exit:
  170.                     EXIT DO
  171.                 END IF
  172.             CASE ELSE
  173.                 '* If there are two Child nodes, find out which one is smaller:
  174.                 ax = child + 1
  175.                 IF ax <= MaxLevel THEN
  176.                     IF CGSortLibArr(ax) < CGSortLibArr(child) THEN
  177.                         child = ax
  178.                     END IF
  179.                 END IF
  180.                 '* Move the value down if it is still not smaller than either one of
  181.                 '* its Child!!!ren:
  182.                 IF CGSortLibArr(i) > CGSortLibArr(child) THEN
  183.                     SWAP CGSortLibArr(i), CGSortLibArr(child)
  184.                     i = child
  185.                 ELSE
  186.                     '* Otherwise, CGSortLibArr() has been restored to a heap from start to MaxLevel!!!,
  187.                     '* so exit:
  188.                     EXIT DO
  189.                 END IF
  190.         END SELECT
  191.     LOOP
  192.  
  193. SUB PercolateUp (CGSortLibArr() AS DOUBLE, start AS _UNSIGNED LONG, MaxLevel AS _UNSIGNED LONG, order AS INTEGER)
  194.     DIM parent AS _UNSIGNED LONG
  195.     SELECT CASE order
  196.         CASE 1
  197.             i = MaxLevel
  198.             '* Move the value in CGSortLibArr(MaxLevel!!!) up the heap until it has
  199.             '* reached its proper node (that is, until it is greater than either
  200.             '* of its Child!!! nodes, or until it has reached 1, the top of the heap):
  201.             DO UNTIL i = start
  202.                 '* Get the subscript for the parent node.
  203.                 parent = start + (i - start) \ 2
  204.                 '* The value at the current node is still bigger than the value at
  205.                 '* its parent node, so swap these two array elements:
  206.                 IF CGSortLibArr(i) > CGSortLibArr(parent) THEN
  207.                     SWAP CGSortLibArr(parent), CGSortLibArr(i)
  208.                     i = parent
  209.                 ELSE
  210.                     '* Otherwise, the element has reached its proper place in the heap,
  211.                     '* so exit this procedure:
  212.                     EXIT DO
  213.                 END IF
  214.             LOOP
  215.         CASE ELSE
  216.             i = MaxLevel
  217.             '* Move the value in CGSortLibArr(MaxLevel!!!) up the heap until it has
  218.             '* reached its proper node (that is, until it is greater than either
  219.             '* of its Child!!! nodes, or until it has reached 1, the top of the heap):
  220.             DO UNTIL i = start
  221.                 '* Get the subscript for the parent node.
  222.                 parent = start + (i - start) \ 2
  223.                 '* The value at the current node is still smaller than the value at
  224.                 '* its parent node, so swap these two array elements:
  225.                 IF CGSortLibArr(i) < CGSortLibArr(parent) THEN
  226.                     SWAP CGSortLibArr(parent), CGSortLibArr(i)
  227.                     i = parent
  228.                 ELSE
  229.                     '* Otherwise, the element has reached its proper place in the heap,
  230.                     '* so exit this procedure:
  231.                     EXIT DO
  232.                 END IF
  233.             LOOP
  234.     END SELECT
  235.  
Title: Re: DIM a (Max Index)
Post by: codeguy on October 18, 2018, 03:43:42 am
There really is no point changing from long because long before you overflow the range of long numbers, you'll run out of heap space anyway, even storing _BYTE arrays. at the most, 1, MAYBE 2 GB will be in memory while the other 5 or 4 is untouched, awaiting processing. My suggestion? split whatever it is you're sorting into no more than 1 GB files and sort each individually. Then merge those files sequentially. Even optimistically, I wouldn't expect anything greater than (1/2 million BYTE-size elements/second for each GHz of CPU speed) for a generalized comparison sort. You MIGHT be lucky and get 1.5 GB/GHz this way. Yes, there are specialized sorts that can handle FAR better hourly volume, but the information you give us makes determination of a truly suitable algorithm pretty much impossible. Believe me, I know. I am among the most viewed and upvoted in Sorting algorithms on Quora, as well as a Top Writer for 2018. I don't know everything, but I am very well versed in what I do know. Please feel free to add more specifics and clarify what you really need so we can give a more satisfactory answer. I have only discussed POSSIBLE solutions to your problem as described as that is all the information you've given allows.
Title: Re: DIM a (Max Index)
Post by: soniaa on October 18, 2018, 11:57:10 am
The numbers I have,
is in thes style;

ARRAY     ( DIMensioned  _INTEGER64 )       -         (NO strigs)  message modificated
q1
q2
q3
....
last is q31

NO q(1) etccc

The value in those ARRAY  are in the range (1 - 8.589.934.590)

In the program these numbers they arrive not in sequence !!!  ....and I do not need to sort.

Every time in the moment I have acquire a number, I need to know if that number has already been processed;
q1 .... q31
 - IF Already processed  THEN  howmanytimesthisnumberalreadyprocessed = howmanytimesthisnumberalreadyprocessed +1
   (howmanytimesthisnumberalreadyprocessed  is in the range  0 - 255  (_UNSIGNED _BYTE)

 - IF NO Already processed  THEN  record this because is an new number
   (IF NO Already processed in this case to increase speed, it is not necessary also assign 1 at howmanytimesthisnumberalreadyprocessed)

Title: Re: DIM a (Max Index)
Post by: SMcNeill on October 18, 2018, 12:38:17 pm
Easiest way in this case is a BINARY file:

DIM number AS _INTEGER64
DIM TimesProcessed AS _UNSIGNED _BYTE
OPEN "RecordCount.bin" FOR BINARY AS #1

DO
    INPUT number
    GET #1, number, TimesProcessed
    TimesProcessed = TimesProcessed + 1
    PUT #1, number, TimesProcessed
LOOP UNTIL number = -1 'or some other exit code

The above gets a number, puts it on the hard drive in a file which will be 8,589,934,590 bytes in size, and each byte will tell you how many times that number has "been processed".  (Up to a maximum of 255 times).

***********************************

Now, this method is assuming you have 8 billion records to work with.  If you have a limited dataset (the q31 almost seems to indicate there's only 31 records with values from 0 to 8.5B??), then I'd build an index, store the value and count, and keep it all in memory for speed and efficiency.

But, since it seems like you're going to process billions of records (2 years to run), saving the results in a file seems to me to be the best way to do this.
   
Title: Re: DIM a (Max Index)
Post by: soniaa on October 18, 2018, 12:43:01 pm
the strategies I tried are the following:

strategy1;
create file.txt  for any number that the program create
into this .txt file write howmanytimesthisnumberalreadyprocessed
To do this I do;
 - IF _FILEEXISTS  (if 0 is an new number , if -1 no new number)
 - OPEN FOR APPEND  (if no exist this procedure create the file)
 - WRITE  1  (append new line in .txt file contain "1"   -  it can also be good in this way)
 - CLOSE

strategy2;
DIM Array(1 to 8589934590) AS _UNSIGNED _BYTE
 - IF Array(mynuber) = 0  THEN  GOTO ......
 - IF Array(mynuber) = >0  THEN  Array(mynuber)  =  Array(mynuber)  +1

strategy3;
OPEN "c.dat" FOR BINARY AS #1
GET #1  ,   mynumber  ,   howmanytimesthisnumberalreadyprocessed

 - IF   howmanytimesthisnumberalreadyprocessed  >  0  THEN     PUT #1  ,  mynumber  ,  howmanytimesthisnumberalreadyprocessed  + 1 :GOTO...

 - PUT #1  ,  mynumber  ,  1
Title: Re: DIM a (Max Index)
Post by: SMcNeill on October 18, 2018, 12:48:50 pm
Strategy 3 doesn't even require an IF; just an increment since 0 + 1 = 1

Get #1, number, TimesProcessed
TimesProcessed = TimesProcessed + 1
Put #1, number, TimesProcessed

Logically, if the value is 0, it's going to increment to 1 when you do the math.
Title: Re: DIM a (Max Index)
Post by: TempodiBasic on October 18, 2018, 01:44:32 pm
Hi guys /Ciao ragazzi
Hi Soniaa/Ciao Soniaa
welcome into the bilingual post /benvenuta nel post bilingue

1. it seems that you must calculate frequency of a number in a gigasize array of data ranging as you post
/1. sembra che tu debba calcolare la frequenza di un numero in un array delle dimensioni di GIGA di dati con un range di valori che hai messaggiato

Quote
The numbers I have,
is in thes style;

ARRAY  ( DIMensioned  _INTEGER64 )         -         (NO strigs)  message modificated
q1
q2
q3
....
last is q31

NO q(1) etccc

The value in those ARRAY  are in the range (1 - 8.589.934.590)  ( _INTEGER64 )

Quote
my question;
in the program I'm making, I estimated that the processing process is 2 years.
I have to adopt more strategies to get the maximum speed.

In some phases of the program, I have numbers in the range (1 to 8.589.934.591) (33bit)
NOT all numbers in the range, ONLY 20%.

I have to check if the number I'm processing has already been processed and how many times.

2.about https://www.qb64.org/forum/index.php?topic=705.msg5867#msg5867 (https://www.qb64.org/forum/index.php?topic=705.msg5867#msg5867)
 if you are not able to speak ItalianEnglish please help yourself with Google Translator, it is better than nothing! Moreover you can break the language barrier and you can get more from help of great coders like those are answering to you (except me I am coder as hobby)
/2. riguardo a https://www.qb64.org/forum/index.php?topic=705.msg5867#msg5867 (https://www.qb64.org/forum/index.php?topic=705.msg5867#msg5867)
se non sai parlare l'IngleseItalianizzato, prego aiutati con Google Translator, è meglio di niente! Inoltre puoi infrangere la barriera del linguaggio e puoi ottenere di più dall'aiuto di grandi programmatori come quelli che ti stanno rispondendo (eccetto me che sono programmatore per hobby)

3. about this
Quote
(howmanytimesthisnumberalreadyprocessed  is in the range  0 - 255  (_UNSIGNED _BYTE)
my questions to understand better (you are talking about processing for at least 2 years, an error should waste much time!):
  3.1 how can you say surely that the frequency cannot be more than 255?
  3.2 must  the numbers that are in the range (1 - 8.589.934.590) and that are not acquired,  be registered as 0 frequency?
/3. circa questo 
Quote
(howmanytimesthisnumberalreadyprocessed  is in the range  0 - 255  (_UNSIGNED _BYTE)
le mie domande per capire meglio (stai parlado di processare per almeno 2 anni, un errore sprecherebbe molto tempo!):
  3.1 come puoi dire sicuramente che la frequenza non può essere maggiore di 255?
  3.2 i numeri che sono nel range  (1 - 8.589.934.590) e che non sono acquisiti devono essere registrati con frequenza 0?

4. about ARRAY
Quote
ARRAY     ( DIMensioned  _INTEGER64 )       -         (NO strigs)  message modificated
q1
q2
q3
....
last is q31

NO q(1) etccc

Quote
Every time in the moment I have acquire a number, I need to know if that number has already been processed;
q1 .... q31
 - IF Already processed  THEN  howmanytimesthisnumberalreadyprocessed = howmanytimesthisnumberalreadyprocessed +1
   (howmanytimesthisnumberalreadyprocessed  is in the range  0 - 255  (_UNSIGNED _BYTE)

 - IF NO Already processed  THEN  record this because is an new number
   (IF NO Already processed in this case to increase speed, it is not necessary also assign 1 at howmanytimesthisnumberalreadyprocessed)

are you sure that this is an array? You talk of array q1 q2 q3  until q31 and not q(1) q(2).... It seems to me that this is a packed field data to manage by UDT
4.1 How can you distinguish among q1 q2 q3 ..q31 of the array in the acquiring phase of data? Or do you acquire a single number of large range?
4.2 How do you know that you can have only q31 number to process and to count the frequency?

/4. riguardo al tuo ARRAY
sei sicura che questo sia un array?  lo definisci composto da q31 elementi da q1 a q31 ma non tipo q(1) q(2)...q(31)
mi sembra che questo sia più un gruppo di dati compressi da gestire tramite UDT.
 4.1 Come puoi distinguere tra q1 q2 q3...q31 dell'array nella fase di acquisizione dei dati? Oppure tu acquisci un singolo numero di ampio range?
4.2 Come fai a sapere che tu puoi avere solo q31 numeri da processare e di cui contare la frequenza?

5. can you be clearer using a pseudocode with large chunks? i.e 
Quote
acquiring a large number ranging from 1 to 8.......  -> record it in a list of processed number -> count how many times it has been processed
/5. puoi essere più chiara usando pseudocodice a grandi blocchi_operazioni? per esempio
Quote
acquisire un grande numero nel range da 1 to 8...... -> registrare il numero in una lista di numeri processati -> contare quante volte esso sia stato processato

Thanks to read
/Grazie per la lettura

Title: Re: DIM a (Max Index)
Post by: soniaa on October 18, 2018, 02:41:28 pm
Thanks for all reply;
I do not care about ordering the numbers.

in reply to;

SMcNeill;
the IF is necessary fom my because;
 - if GET acquisition >1  THEN  put value +1   AND  goto  inaditection
 - else   PUT  value  1  AND  goto otherdirection

 TempodiBasic;
3.1 -  its max 255
3.2 -  it is not necessary
4.1 -  I have write a program in mode for no use q(1) because using this have lowww performance,  q1 is moooore fasttt
4.2 -  the program loop in 31 step  forward and back billions of times  (brute force)
5.0 -  ?????
Title: Re: DIM a (Max Index)
Post by: soniaa on October 18, 2018, 02:58:20 pm
The numbers generated by the array   q1   q2  .... q31    they can be  stored together in an singe  registry,
but in some way they must be associated with  howmanytimesalreadyprocessed (range 1 - 255 )

the registry  it must not have  references  of  the  number  has arrived by q1 or q2...
Title: Re: DIM a (Max Index)
Post by: SMcNeill on October 18, 2018, 03:10:20 pm
SMcNeill;
the IF is necessary fom my because;
 - if GET acquisition >1  THEN  put value +1   AND  goto  inaditection
 - else   PUT  value  1  AND  goto otherdirection

The IF action is independent of the GET/ PUT action.

Get #1, number, TimesProcessed
TimesProcessed = TimesProcessed + 1
Put #1, number, TimesProcessed
IF TimesProcessed = 1 THEN GOTO inaditection ELSE GOTO otherdirection
Title: Re: DIM a (Max Index)
Post by: soniaa on October 18, 2018, 03:48:46 pm
SMcNeill;
the IF is necessary fom my because;
 - if GET acquisition >1  THEN  put value +1   AND  goto  inaditection
 - else   PUT  value  1  AND  goto otherdirection

The IF action is independent of the GET/ PUT action.

Get #1, number, TimesProcessed
TimesProcessed = TimesProcessed + 1
Put #1, number, TimesProcessed
IF TimesProcessed = 1 THEN GOTO inaditection ELSE GOTO otherdirection


****what you wrote  is exactly the same as what I wrote !!??...
...or no?
Title: Re: DIM a (Max Index)
Post by: SMcNeill on October 18, 2018, 04:02:31 pm
Here's a real world implementation of how I'd handle this problem:

Code: QB64: [Select]
  1. DIM TimesProcessed AS _UNSIGNED _BYTE
  2.  
  3.  
  4. PRINT "Clearing and Opening Multiple files.  This will take a few seconds."
  5. FOR i = 0 TO 8589 'multiple files to hold our values, since we have such a large range for output
  6.     file$ = "FileCounter" + LTRIM$(RTRIM$(STR$(i))) + ".bin"
  7.     OPEN file$ FOR OUTPUT AS #i + 1: CLOSE #i + 1 'blank the file if it exists
  8.     OPEN file$ FOR BINARY AS #i + 1 'open a new file to track values
  9.  
  10. Limit = 100 'The number of values we're going to process (times 31, since you have values from q1 to q33)
  11.  
  12.  
  13. StartTime## = TIMER
  14. FOR i = 1 TO Limit
  15.     FOR j = 1 TO 31 '31 times since the values are for q1 to q31
  16.         q = INT(RND * 8589934590)
  17.         FileNumber = q \ 1000000
  18.         ValueStored = q MOD 1000000
  19.         GET #FileNumber + 1, ValueStored, TimesProcessed
  20.         TimesProcessed = TimesProcessed + 1
  21.         PUT #FileNumber + 1, ValueStored, TimesProcessed
  22.         IF TimesProcessed = 1 THEN PRINT "New Entry", ELSE PRINT "Processed"; TimesProcessed; " already",
  23.     NEXT
  24. CLOSE 'close all files
  25. FinishTime## = TIMER - StartTime##
  26.  
  27. PRINT USING "#####.##### seconds to process 31 numbers, ### times."; FinishTime##, Limit
  28.  
  29. PRINT "Clean Up the Disk?  (Y/N)"
  30.     _LIMIT 30
  31.     i$ = UCASE$(INKEY$)
  32. LOOP UNTIL i$ = "Y" OR i$ = "N"
  33.  
  34. IF i$ = "Y" THEN
  35.     PRINT
  36.     PRINT "Cleaning disk... This too may take several seconds."
  37.     FOR i = 0 TO 8589 'multiple files to hold our values, since we have such a large range for output
  38.         file$ = "FileCounter" + LTRIM$(RTRIM$(STR$(i))) + ".bin"
  39.         KILL file$
  40.     NEXT
  41.  

Now, a few things to note:

1) Notice that I've broken my data down to 1MB files?  This is because it's much faster for the drive to reserve space for a new file that's 1MB in size, than it is to do the same for one that's 8GB in size.  In fact, it's a lot faster to create almost 9000 1MB files, than it is for my drive to create and reserve space for a single 8GB one.  Play around with the values if you want and see how the speed is affected...

2) It still takes me about 40 seconds to process 100 sets of numbers, using this method -- but that's the hit you're going to get for having and using so much file access.  Times will probably be much better on your SSD than it is on my old one here, but it's still doing to be a lot slower than running some sort of counter in memory.

3) You've never told us what the MAXIMUM AMOUNT OF DATA NEEDS TO BE PROCESSED.  If there's less than a million total records, I'd use an index to see what values have appeared, increment them, and keep the whole process in memory.  You've told us that there's 31 values, ranging from 0 to 8.5B, but you haven't mentioned how many times you're going to repeat this process.  If there's 31 total values, there's absolutely no reason to use file access to track number of times a value is processed.  I'd only go the file route, like above, if it was a process which I had to repeat billions of times for those q1 to q31 numbers.  (And how can you be certain that they're only going to occur less than 255 times total, if there's billions of times of them being processed?)

From what you've said, the above is the best way I can think of the deal with the problem you've presented: tracking a counter of a limitless number of passes, of 31 values, which range from 0 to 8GB...  If what you need ISN'T for a limitless number of passes, then give us an idea of the max number of passes which are used with your data.  There's MUCH better ways to deal with the issue, with smaller datasets.
Title: Re: DIM a (Max Index)
Post by: soniaa on October 18, 2018, 07:11:31 pm
I repeat;
The number I want stored for compare are in the range 0  to 33bit.
No all number in the range the program create, only approximately 20 -30 %.
The program create these number billion of time.
Every when a program create an number I have to know if that number is already processed, then store how many times.
thats is all,
no other.

I think the solutions OPEN c.dat for binary is for the moment the best I try.
If you have other solution muck speed write to me.
Title: Re: DIM a (Max Index)
Post by: TempodiBasic on October 18, 2018, 07:43:36 pm
@soniaa

I can't imagine how to structure a program to do what you are talking about!
so I read again your requests in this thread:

the first:
Quote
I have 8.000.000.000 of items,
the value of any ia is max 100
I want create:
Dim q(8000000000) as _byte
BUT THES Produce error
in my TOSHIBA i3 8GB di ram it gives me Out of memory error on compiling

the second:
Quote
I have a goog idea;
Many of my data is value 0
i can no create array for this,
in thes mode the array I want to DIM are approximately 3.000.000.000
Now BIG Problem;
I want put my data into the arrays one by one,
example:
visitornumber1000000000 = 5
visitornumber2000000000 = 7
I do;
DIM a(1000000000 to 1000000000) as _BYTE
a(1000000000) = 5
but why do you create an array made by one single element with index 1000000000?

the third:
Quote
I have test this;
1  REM $DYNAMIC 
2  DIM a(1000000000 to 1000000000) as _BYTE
3  a(1000000000) = 5
4  REDIM _PRESERVE a(2000000000 to 2000000000)
But receive error
I cannot agree.
If I copy and paste this your code in QB64 ide I get compiling no error!
But we must observe that there is a logic error...the array a() as _BYTE on linecode 2 and the array a () on linecode 4 are not the same array, the first is of type _BYTE while the second is of type SINGLE....
moreover also if I add as _BYTE at the end of linecode 4 it runs ok in QB64.

the fourth
Quote
In some phases of the program, I have numbers in the range (1 to 8.589.934.591) (33bit)
NOT all numbers in the range, ONLY 20%.

I have to check if the number I'm processing has already been processed and how many times.
How many numbers? the only 20% of how many? We must process only 20% of numbers because only they are in the range.
We must record what number and how many times has been precessed. But the 20% of infinite is infinite! :-)
How can you be specific about number of number to process?

the fifth
Quote
The numbers I have,
is in thes style;

ARRAY  ( DIMensioned  _INTEGER64 )         -         (NO strigs)  message modificated
q1
q2
q3
....
last is q31

NO q(1) etccc

The value in those ARRAY  are in the range (1 - 8.589.934.590)  ( _INTEGER64 )
Here you decide to use 31 variables _INTEGER64. Why you do not say!

the sixth:
Quote
In the program these numbers they arrive not in sequence !!!  ....and I do not need to sort.

Every time in the moment I have acquire a number, I need to know if that number has already been processed;
q1 .... q31
Again how many numbers we do not know! Why do you use a cluster of 31 variable of _INTEGER64?

the seventh
Quote
strategy1;
create file.txt  for any number that the program create
into this .txt file write howmanytimesthisnumberalreadyprocessed
To do this I do;
 - IF _FILEEXISTS  (if 0 is an new number , if -1 no new number)
 - OPEN FOR APPEND  (if no exist this procedure create the file)
 - WRITE  1  (append new line in .txt file contain "1"   -  it can also be good in this way)
 - CLOSE
IMHO file/disk access is slower than RAM access... also if you can optimize this strategy above showed

the eighth
Quote
strategy2;
DIM Array(1 to 8589934590) AS _UNSIGNED _BYTE
 - IF Array(mynuber) = 0  THEN  GOTO ......
 - IF Array(mynuber) = >0  THEN  Array(mynuber)  =  Array(mynuber)  +1
this strategy can work if you have enough RAM but so you have already solved the issue

the nineth
Quote
strategy3;
OPEN "c.dat" FOR BINARY AS #1
GET #1  ,   mynumber  ,   howmanytimesthisnumberalreadyprocessed

 - IF   howmanytimesthisnumberalreadyprocessed  >  0  THEN     PUT #1  ,  mynumber  ,  howmanytimesthisnumberalreadyprocessed  + 1 :GOTO...

 - PUT #1  ,  mynumber  ,  1
IMHO an access to file/disk in bynary mode should be faster than to txt files but again slower to RAM access.

the tenth
Quote
the program loop in 31 step  forward and back billions of times  (brute force)
Here I ask "Why a loop in 31 steps and not 31 times a loop of a single step as all 31 variable are _INTEGER64?

the eleventh
Quote
The numbers generated by the array   q1   q2  .... q31    they can be  stored together in an singe  registry,
but in some way they must be associated with  howmanytimesalreadyprocessed (range 1 - 255 )
a simple UDT  can do this:
Code: QB64: [Select]
  1. TYPE q31
  2.     HowMany AS _UNSIGNED _BYTE
  3.     q AS _INTEGER64
each variable of this type uses 9 byte of RAM (8 for _INTEGER64 and 1 for _BYTE) using this structure you can save/load from file/disk each variable with one access and the first byte is byself loaded in howmany and the last 8 bytes in q that record the value of number.

the twelveth
Quote
The number I want stored for compare are in the range 0  to 33bit.
No all number in the range the program create, only approximately 20 -30 %.
The program create these number billion of time.
Every when a program create an number I have to know if that number is already processed, then store how many times.
thats is all,

the other 80-70% of numbers generated by program are <1 and > 8.589.934.590
you process only numbers in the range 1--8.589.934.590 and you rate the frequency....
well the 31 variables _INTEGER64 seems only an your choice in projecting the program...
now I can create a right generator of numbers to emulate your issue and show how is my solution!

Perchè non ti aiuti con il traduttore di Google o Reverso, non è il massimo ma almeno costruisci la frase in inglese più correttamente.
Non tutti i membri di questo forum sono inglesi o americani.

Good Coding
Title: Re: DIM a (Max Index)
Post by: SMcNeill on October 18, 2018, 07:52:39 pm
What I'm really curious about, at this point, is how do we know that there's always going to be less than 256 occurrences of a number?  You're limiting the counter to a single unsigned byte, but it seems like billions of numbers generated would have a chance of at least one of them occurring more than 255 times.
Title: Re: DIM a (Max Index)
Post by: SMcNeill on October 18, 2018, 08:21:16 pm
The best solution:
Code: QB64: [Select]
  1. DIM TimesProcessed AS _UNSIGNED _BYTE
  2.  
  3. limit = 1000000
  4. m = _MEMNEW(9000000000)
  5. _MEMFILL m, m.OFFSET, 9000000000, 0 AS _UNSIGNED _BYTE
  6.  
  7. starttime## = TIMER
  8. FOR i = 1 TO limit
  9.     FOR j = 1 TO 31 'for 31 q's
  10.         q = INT(RND * 9000000000)
  11.         _MEMGET m, m.OFFSET + q, TimesProcessed
  12.         TimesProcessed = TimesProcessed + 1
  13.         _MEMPUT m, m.OFFSET + q, TimesProcessed
  14.     NEXT
  15. finishtime## = TIMER - starttime##
  16. PRINT USING "#####.### seconds for ###,###,###,### numbers"; finishtime##, 31 * limit

A few cavets here:

1) You need a 64-bit version of QB64 to use this.
2) You need 9GB of ram or so.

On my machine, we process 31,000,000 values in 5 seconds, so I'd guess a billion numbers would take less than 5 minutes. (Compared with 40 seconds for 3,100 values using file storage.)

Title: Re: DIM a (Max Index)
Post by: TempodiBasic on October 19, 2018, 07:41:12 pm
@Steve
you've used the direct access to the memory! I think that I cannot get so fast results.
I'm no expert to use RAM in this low level mode!
Great lesson.
Title: Re: DIM a (Max Index)
Post by: soniaa on October 22, 2018, 03:32:03 pm
I have Upgraded the RAM in my Pc;
(old; at the start of this post: 6 GB)
(old; at the middle of this post: 8 GB)
NOW: 12 GB
Title: Re: DIM a (Max Index)
Post by: SMcNeill on October 22, 2018, 03:47:20 pm
I have Upgraded the RAM in my Pc;
(old; at the start of this post: 6 GB)
(old; at the middle of this post: 8 GB)
NOW: 12 GB

Try a method like the one in reply #46.  Use _MEMNEW and _MEMPUT/_MEMGET to write the values directly to memory.
Title: Re: DIM a (Max Index)
Post by: soniaa on October 22, 2018, 03:51:42 pm
Thank you so much,,,  to all of you !!!

What seemed to me a good way (open c.dat for binary);
I found out to be not good,
when the records stored in this file they become 1 billion, GET e PUT it slows down a lot,
...probably because this method does not use much RAM.
Title: Re: DIM a (Max Index)
Post by: soniaa on October 22, 2018, 05:14:51 pm

- - - - My Problem Is Solved. - - - - -

I've just finished a test;

I inserted in my program the method:      REDIM rec(9000000000) as _UNSIGNED _BYTE
The results  they were excellent !!!  .....much more expected !!!
The speed reached is enormous, my program has runned in phases where I had never-never-never been able to arrive.
The estimated time for to get to the end   it was  2 to 6  years first,

--- NOW The estimated time for to get to the end  it is  1 or 2 hours !!! ---