Author Topic: DIM a (Max Index) (Read 77730 times)

Qwerkey · « **Reply #15 on:** October 17, 2018, 05:43:42 am »

I can confirm (naturally) that bunching the entries together before writing to the disk as described by Steve significantly reduces the write time.

Qwerkey · « **Reply #16 on:** October 17, 2018, 06:22:30 am »

Using the bunching method, writing 8,000,000 entries takes 7s (on my computer). For 8,000,000,000 entries, the time taken will be 2hrs, which I suppose is not too bad to represent every person on the planet with 1 byte on a home PC.

Code: QB64: [Select]

Start! = TIMER
RANDOMIZE (TIMER)
OPEN "temp.rnd" FOR RANDOM AS #1 LEN = 10000
FIELD 1, 10000 AS a$
FOR b~& = 1 TO 800
    d$ = ""
    FOR e% = 1 TO 10000
        c%% = INT(RND * 100)
        d$ = d$ + CHR$(c%%)
    NEXT e%
    LSET a$ = d$
    PUT #1, b~&
NEXT b~&
GET #1, 2
d$ = RIGHT$(a$, 1)
PRINT d$, ASC(d$)
CLOSE #1
PRINT TIMER - Start!

Incidentally, if you try to open the file ("temp.rnd" in the case of my code here) with Notepad, you may be surprised to find the file apparently full of non-ASCII characters. This happens because of the use of ASCII characters below the value of 32 (some of which are non-printable). Sampling the file and extracting individual elements always gives the expected result. It is also comforting to note that the file with 8 million 1-byte characters is 8Mb in length. Richard

SMcNeill · « **Reply #17 on:** October 17, 2018, 11:52:00 am »

Quote from: Qwerkey on October 17, 2018, 06:22:30 am

Using the bunching method, writing 8,000,000 entries takes 7s (on my computer). For 8,000,000,000 entries, the time taken will be 2hrs, which I suppose is not too bad to represent every person on the planet with 1 byte on a home PC.

For best performance (which you're going to want with a file that size), first shell out to "fsutil fsinfo ntfsinfo c:" (or whichever drive you're saving to), and read the output of the "Bytes Per Physical Sector" (if it exists).

This will tell you the drives sector size and allow you to minimize disk access calls as much as possible. Old drives (and older OSes -- before Win 8, I think), primarily used 512 bytes per sector, so that's the size you want to write to disk. Some newer drives have 4096 bytes per sector, so use that when possible. (I think all the new drives with 3TB+ storage use 4096 bytes per sector, if you have/use any of those.)

When in doubt, I'd default to a dumpsize of 4096 bytes, as that's a perfect multiple of 512 anyway....

My example used a size of 1000, just out of laziness, but it's not the most efficient. If 512 is the sector size, we'd write 512 bytes at once on the drive, then only write 488 the second time... We're still making more access calls than absolutely necessary.

It's a little thing, but done billions of times, it makes a noticeable difference for us. :)

Qwerkey · « **Reply #18 on:** October 17, 2018, 12:09:38 pm »

Crumbs, Steve! You know about optimising write speed to hards disks, as well?

Sonia, you will find that there is nothing that Steve doesn't know: he's our resident Man of Wisdom.

Petr · « **Reply #19 on:** October 17, 2018, 03:47:38 pm »

Quote

Is nothing that Steve doesn't know: he's our resident Man of Wisdom.

I absolutely agree.

soniaa · « **Reply #20 on:** October 17, 2018, 05:40:18 pm »

thank you all,
now I'm testing your suggestions,
then I will let you know which ones are most suitable for my needs.

soniaa · « **Reply #21 on:** October 17, 2018, 05:45:26 pm »

my hardware;
Win 7 64bit
10 GB RAM
SSD 240GB for operating system
SSD 120GB for saving my data
Motherboard sata2 AMDphenomII945 3.00GHz 4processor

SMcNeill · « **Reply #22 on:** October 17, 2018, 05:56:16 pm »

Quote from: SMcNeill on October 16, 2018, 01:54:19 pm

If the issue is just with the index, and not an actual memory issue, can you read and work with the data as a different variable type?

Instead of:

Dim q(8000000000) as _byte

Try:

Dim q(2000000000) as LONG

Once in memory, it's simple enough to break those longs down to individual bytes, if necessary.

A slight demo of this:

Code: QB64: [Select]

DIM LongVar AS LONG
DIM ByteVar(4) AS _BYTE
 
LongVar = &H01100620 '4 bytes in hex: 01/10/06/20
ByteVar(1) = _BLUE32(LongVar)
ByteVar(2) = _GREEN32(LongVar)
ByteVar(3) = _RED32(LongVar)
ByteVar(4) = _ALPHA32(LongVar)
 
FOR I = 1 TO 4
   PRINT ByteVar(I)
NEXT

Your output should be 32, 6, 16, 1. (Feel free to reverse indexes, if you prefer to read them as 1, 16, 6, 32)

It's basically that simple, with us using _RGBA32 to rebuild the LONG back from our 4 bytes, if we need to write back to the disk.[/code]

****************

Most folks don't think of using the color commands with regular values, but all they do is separate a LONG into bytes, or assemble bytes into a LONG... There's no reason why we can't make them work for other stuff like this for us. ;)

soniaa · « **Reply #23 on:** October 17, 2018, 06:03:49 pm »

my question;
in the program I'm making, I estimated that the processing process is 2 years.
I have to adopt more strategies to get the maximum speed.

In some phases of the program, I have numbers in the range (1 to 8.589.934.591) (33bit)
NOT all numbers in the range, ONLY 20%.

I have to check if the number I'm processing has already been processed and how many times.

soniaa · « **Reply #24 on:** October 17, 2018, 06:13:34 pm »

The numbers I have,
is in thes style;

ARRAY ( DIMensioned _INTEGER64 ) - (NO strigs) message modificated
q1
q2
q3
....
last is q31

NO q(1) etccc

The value in those ARRAY are in the range (1 - 8.589.934.590) ( _INTEGER64 )

codeguy · « **Reply #25 on:** October 17, 2018, 06:38:33 pm »

If you know only 20% will be different AND they are relatively random, 1.6 billion 20% of 8 billion. if it's 32-bit values, makes a shade over 6 billion bytes. Really, with a bit of squeezing in a b-tree, you might be able to squeeze that out quickly. TreeSort and use no disk for the sort. Also a HashTable can do the job. or even CountingSort. I'd recommend HashTable as the fastest way for random sets. Even ordered sets, but then why would you ask? But I digress. HashTables provide a fast, convenient way to search for things and correctly constructed, yield an average access of 1.25 seeks/find (or not) and very nearly 0(1) insertion. That too is another doable approach. If you cannot spare space for that and have a brick-wall limit of 6GB, you can run this in segments and mege the results. You could use SteveSort or HashListSort. Both will be pretty quick. HashListSort will even give you a sorted result too.

You're NOT being clear. How can you have single bytes and values well beyond single byte values. My answer remains, but you're jus not being clear at all and yes, I agree my answer is all over the place, but I'm taking into account numerous scenarios, none of which can be used unless you're more specific. FAR more specific.

soniaa · « **Reply #26 on:** October 17, 2018, 07:30:16 pm »

Quote from: codeguy on October 17, 2018, 06:38:33 pm

If you know only 20% will be different AND they are relatively random, 1.6 billion 20% of 8 billion. if it's 32-bit values, makes a shade over 6 billion bytes. Really, with a bit of squeezing in a b-tree, you might be able to squeeze that out quickly. TreeSort and use no disk for the sort. Also a HashTable can do the job. or even CountingSort. I'd recommend HashTable as the fastest way for random sets. Even ordered sets, but then why would you ask? But I digress. HashTables provide a fast, convenient way to search for things and correctly constructed, yield an average access of 1.25 seeks/find (or not) and very nearly 0(1) insertion. That too is another doable approach. If you cannot spare space for that and have a brick-wall limit of 6GB, you can run this in segments and mege the results. You could use SteveSort or HashListSort. Both will be pretty quick. HashListSort will even give you a sorted result too.

I,m sorry,,,
I did not understand anything,,,,not a word,,,

Cobalt · « **Reply #27 on:** October 17, 2018, 07:43:15 pm »

Quote from: soniaa on October 17, 2018, 07:30:16 pm

Quote from: codeguy on October 17, 2018, 06:38:33 pm
If you know only 20% will be different AND they are relatively random, 1.6 billion 20% of 8 billion. if it's 32-bit values, makes a shade over 6 billion bytes. Really, with a bit of squeezing in a b-tree, you might be able to squeeze that out quickly. TreeSort and use no disk for the sort. Also a HashTable can do the job. or even CountingSort. I'd recommend HashTable as the fastest way for random sets. Even ordered sets, but then why would you ask? But I digress. HashTables provide a fast, convenient way to search for things and correctly constructed, yield an average access of 1.25 seeks/find (or not) and very nearly 0(1) insertion. That too is another doable approach. If you cannot spare space for that and have a brick-wall limit of 6GB, you can run this in segments and mege the results. You could use SteveSort or HashListSort. Both will be pretty quick. HashListSort will even give you a sorted result too.

I,m sorry,,,
I did not understand anything,,,,not a word,,,

You are not alone there. ?? :| ??

bplus · « **Reply #28 on:** October 17, 2018, 07:55:13 pm »

;-)) "What was the question?"

codeguy · « **Reply #29 on:** October 17, 2018, 09:29:39 pm »

Sorry my answer was all over the place. For a reasonable range, IntoSort version or QuickSort using radomized median of 3 sn't bad. IntroSort guarantees a REASONABLE and predictable performance and using the n/2 required extra storage version of MergeSort, you can sort up to 4 billion at a time and not exhaust ram if you use disk space for the temporary array. At most, it will require n/2 of the range being sorted if it even resorts to using that part. Othertwise, for a completely in-place, but not guaranteed stable sort, use IntoSort which uses QuickSort, HeapSort and InsertionSort. Using that approach, you can EASILY just use IntroSort and be guaranteed a reasonable completion time, regardless of the range.

News:

Author Topic: DIM a (Max Index) (Read 77730 times)

Qwerkey

Re: DIM a (Max Index)

Qwerkey

Re: DIM a (Max Index)

SMcNeill

Re: DIM a (Max Index)

Qwerkey

Re: DIM a (Max Index)

Petr

Re: DIM a (Max Index)

soniaa

Re: DIM a (Max Index)

soniaa

Re: DIM a (Max Index)

SMcNeill

Re: DIM a (Max Index)

soniaa

Re: DIM a (Max Index)

soniaa

Re: DIM a (Max Index)

codeguy

Re: DIM a (Max Index)

soniaa

Re: DIM a (Max Index)

Cobalt

Re: DIM a (Max Index)

bplus

Re: DIM a (Max Index)

codeguy

Re: DIM a (Max Index)