Author Topic: How do you shuffle highly repetitive arrays? (Read 1371 times)

codeguy · « **on:** June 01, 2020, 06:51:03 pm »

'CgShuffleAlgorithm
'* for highly repetitive arrays and not so repetitive ones
'* this is a SLIGHT modification of Knuth Shuffle
OPTION _EXPLICIT
DIM CgMainDemoLow AS _INTEGER64: CgMainDemoLow = 0
DIM CgMainDemoHigh AS _INTEGER64: CgMainDemoHigh = CgMainDemoLow + 1048575

'* rounds defines the number of attempts to find a non-matching array element
DIM rounds AS INTEGER: rounds = 4

'* minimum array element value
DIM CgMainDemoCSTAMin AS DOUBLE: CgMainDemoCSTAMin = 0

'* maximum array element
DIM CgMainDemoCSTAMax AS DOUBLE: CgMainDemoCSTAMax = 1

REDIM CgShuffleTestAray(CgMainDemoLow TO CgMainDemoHigh) AS DOUBLE

'* lagre enough to handle array index values beyond most practical applications because you never konow.
DIM v AS _INTEGER64

'* generate
FOR v = CgMainDemoLow TO CgMainDemoHigh
    CgShuffleTestAray(v) = CgMainDemoCSTAMin + INT(RND * (CgMainDemoCSTAMax - CgMainDemoCSTAMin + 1 / 2))
NEXT

'* display results
FOR v = CgMainDemoLow TO CgMainDemoHigh
    PRINT CgShuffleTestAray(v);
NEXT

CgShuffle CgShuffleTestAray(), CgMainDemoLow, CgMainDemoHigh, rounds

SUB CgShuffle (a() AS DOUBLE, start AS _INTEGER64, finish AS _INTEGER64, rounds AS INTEGER)
    DIM cgsp AS _INTEGER64: cgsp = start
    DIM cgsq AS _INTEGER64: cgsq = finish
    DIM cgsr AS _INTEGER64: cgsr = cgsq - cgsp
    DIM cgss AS _UNSIGNED _INTEGER64
    DIM cgsh AS _INTEGER64
    DO WHILE cgsr > 0
        cgss = rounds
        DO
            cgsh = cgsp + INT(RND * cgsr) + 1
            IF a(cgsp) <> a(cgsh) THEN
                SWAP a(cgsp), a(cgsh)
                EXIT DO
            END IF
            cgss = cgss - 1
        LOOP UNTIL cgss < 1
        cgsr = cgsr - 1
        cgsp = cgsq - cgsr
    LOOP
END SUB

SMcNeill · « **Reply #1 on:** June 01, 2020, 07:48:20 pm »

Isn't the easiest way to "shuffle" an array simply:

FOR I = 0 TO UBOUND(ARRAY)
SWAP ARRAY(I), ARRAY(INT(RND * UBOUND(ARRAY))) + 1
NEXT

Change each array element with another random array element, and call it "shuffled".

codeguy · « **Reply #2 on:** June 01, 2020, 08:23:21 pm »

When the array is highly repetitive, there is a strong likelihood you are swapping like values, which in essence isn't a suitable shuffle and a time-waster, especially when preparing sorted data for building a binary search tree (even using QuickSort or TreeSort). This modification of the Knuth Shuffle virtually eliminates worst-case O(n^2) performance while the penalty exacted is minimal. You don't want to build a BST or perform either of those sorts with already-sorted data.

SMcNeill · « **Reply #3 on:** June 01, 2020, 08:51:19 pm »

Quote from: codeguy on June 01, 2020, 08:23:21 pm

When the array is highly repetitive, there is a strong likelihood you are swapping like values, which in essence isn't a suitable shuffle and a time-waster, especially when preparing sorted data for building a binary search tree (even using QuickSort or TreeSort). This modification of the Knuth Shuffle virtually eliminates worst-case O(n^2) performance while the penalty exacted is minimal. You don't want to build a BST or perform either of those sorts with already-sorted data.

I don't see anything wrong with swapping 2 like values. It's just as fast to swap 2 values as it is to condition check repeatively until you can swap them with a different value. Swap each element once, with another random element, and be done with it. I'd need to see some timed examples to showcase why there's any true reason for more complicated code. Maybe it's just the stuff I normally code personally, but I've never had a single-pass shuffle routine with any time to completion issues. Can you share some examples where a more complex routine makes a significant difference in perform?

STxAxTIC · « **Reply #4 on:** June 01, 2020, 09:25:43 pm »

Yeah this one is interesting - what are we using for a definition of "shuffled" when the uniqueness in the set is very sparse? I see both points but can't decide where I sit on this yet.

And if we have no actual definition, you could just use the log of the multiplicity. (i.e. dimensionless entropy)

*Is* there a solid definition of "shuffled" that you're working with codeguy?

bplus · « **Reply #5 on:** June 01, 2020, 10:18:45 pm »

Google "best shuffle algorithm"
"https://en.wikipedia.org/wiki/Fisher–Yates_shuffle" <<< copy / paste the whole contents between quotes

You will see Steve is still advocating the Naive Method.

Here is an old demo: https://www.qb64.org/forum/index.php?topic=1511.msg110457#msg110457

News:

Author Topic: How do you shuffle highly repetitive arrays? (Read 1371 times)

codeguy

How do you shuffle highly repetitive arrays?

SMcNeill

Re: How do you shuffle highly repetitive arrays?

codeguy

Re: How do you shuffle highly repetitive arrays?

SMcNeill

Re: How do you shuffle highly repetitive arrays?

STxAxTIC

Re: How do you shuffle highly repetitive arrays?

bplus

Re: How do you shuffle highly repetitive arrays?