Text Only | Text with Attachments

QB64.org Forum

Active Forums => Programs => Topic started by: codeguy on June 01, 2020, 06:51:03 pm

Title: How do you shuffle highly repetitive arrays?
Post by: codeguy on June 01, 2020, 06:51:03 pm: Code: [Select]
'CgShuffleAlgorithm '* for highly repetitive arrays and not so repetitive ones '* this is a SLIGHT modification of Knuth Shuffle OPTION _EXPLICIT DIM CgMainDemoLow AS _INTEGER64: CgMainDemoLow = 0 DIM CgMainDemoHigh AS _INTEGER64: CgMainDemoHigh = CgMainDemoLow + 1048575 '* rounds defines the number of attempts to find a non-matching array element DIM rounds AS INTEGER: rounds = 4 '* minimum array element value DIM CgMainDemoCSTAMin AS DOUBLE: CgMainDemoCSTAMin = 0 '* maximum array element DIM CgMainDemoCSTAMax AS DOUBLE: CgMainDemoCSTAMax = 1 REDIM CgShuffleTestAray(CgMainDemoLow TO CgMainDemoHigh) AS DOUBLE '* lagre enough to handle array index values beyond most practical applications because you never konow. DIM v AS _INTEGER64 '* generate FOR v = CgMainDemoLow TO CgMainDemoHigh CgShuffleTestAray(v) = CgMainDemoCSTAMin + INT(RND * (CgMainDemoCSTAMax - CgMainDemoCSTAMin + 1 / 2)) NEXT '* display results FOR v = CgMainDemoLow TO CgMainDemoHigh PRINT CgShuffleTestAray(v); NEXT CgShuffle CgShuffleTestAray(), CgMainDemoLow, CgMainDemoHigh, rounds SUB CgShuffle (a() AS DOUBLE, start AS _INTEGER64, finish AS _INTEGER64, rounds AS INTEGER) DIM cgsp AS _INTEGER64: cgsp = start DIM cgsq AS _INTEGER64: cgsq = finish DIM cgsr AS _INTEGER64: cgsr = cgsq - cgsp DIM cgss AS _UNSIGNED _INTEGER64 DIM cgsh AS _INTEGER64 DO WHILE cgsr > 0 cgss = rounds DO cgsh = cgsp + INT(RND * cgsr) + 1 IF a(cgsp) <> a(cgsh) THEN SWAP a(cgsp), a(cgsh) EXIT DO END IF cgss = cgss - 1 LOOP UNTIL cgss < 1 cgsr = cgsr - 1 cgsp = cgsq - cgsr LOOP END SUB
Title: Re: How do you shuffle highly repetitive arrays?
Post by: SMcNeill on June 01, 2020, 07:48:20 pm: Isn't the easiest way to "shuffle" an array simply:

FOR I = 0 TO UBOUND(ARRAY)
SWAP ARRAY(I), ARRAY(INT(RND * UBOUND(ARRAY))) + 1
NEXT

Change each array element with another random array element, and call it "shuffled".
Title: Re: How do you shuffle highly repetitive arrays?
Post by: codeguy on June 01, 2020, 08:23:21 pm: When the array is highly repetitive, there is a strong likelihood you are swapping like values, which in essence isn't a suitable shuffle and a time-waster, especially when preparing sorted data for building a binary search tree (even using QuickSort or TreeSort). This modification of the Knuth Shuffle virtually eliminates worst-case O(n^2) performance while the penalty exacted is minimal. You don't want to build a BST or perform either of those sorts with already-sorted data.
Title: Re: How do you shuffle highly repetitive arrays?
Post by: SMcNeill on June 01, 2020, 08:51:19 pm: Quote from: codeguy on June 01, 2020, 08:23:21 pm
When the array is highly repetitive, there is a strong likelihood you are swapping like values, which in essence isn't a suitable shuffle and a time-waster, especially when preparing sorted data for building a binary search tree (even using QuickSort or TreeSort). This modification of the Knuth Shuffle virtually eliminates worst-case O(n^2) performance while the penalty exacted is minimal. You don't want to build a BST or perform either of those sorts with already-sorted data.

I don't see anything wrong with swapping 2 like values. It's just as fast to swap 2 values as it is to condition check repeatively until you can swap them with a different value. Swap each element once, with another random element, and be done with it. I'd need to see some timed examples to showcase why there's any true reason for more complicated code. Maybe it's just the stuff I normally code personally, but I've never had a single-pass shuffle routine with any time to completion issues. Can you share some examples where a more complex routine makes a significant difference in perform?
Title: Re: How do you shuffle highly repetitive arrays?
Post by: STxAxTIC on June 01, 2020, 09:25:43 pm: Yeah this one is interesting - what are we using for a definition of "shuffled" when the uniqueness in the set is very sparse? I see both points but can't decide where I sit on this yet.

And if we have no actual definition, you could just use the log of the multiplicity. (i.e. dimensionless entropy)

*Is* there a solid definition of "shuffled" that you're working with codeguy?
Title: Re: How do you shuffle highly repetitive arrays?
Post by: bplus on June 01, 2020, 10:18:45 pm: Google "best shuffle algorithm"
"https://en.wikipedia.org/wiki/Fisher–Yates_shuffle" <<< copy / paste the whole contents between quotes

You will see Steve is still advocating the Naive Method.

Here is an old demo: https://www.qb64.org/forum/index.php?topic=1511.msg110457#msg110457