QB64.org Forum
Active Forums => Programs => Topic started by: codeguy on June 01, 2020, 06:51:03 pm
-
'CgShuffleAlgorithm
'* for highly repetitive arrays and not so repetitive ones
'* this is a SLIGHT modification of Knuth Shuffle
OPTION _EXPLICIT
DIM CgMainDemoLow AS _INTEGER64: CgMainDemoLow = 0
DIM CgMainDemoHigh AS _INTEGER64: CgMainDemoHigh = CgMainDemoLow + 1048575
'* rounds defines the number of attempts to find a non-matching array element
DIM rounds AS INTEGER: rounds = 4
'* minimum array element value
DIM CgMainDemoCSTAMin AS DOUBLE: CgMainDemoCSTAMin = 0
'* maximum array element
DIM CgMainDemoCSTAMax AS DOUBLE: CgMainDemoCSTAMax = 1
REDIM CgShuffleTestAray(CgMainDemoLow TO CgMainDemoHigh) AS DOUBLE
'* lagre enough to handle array index values beyond most practical applications because you never konow.
DIM v AS _INTEGER64
'* generate
FOR v = CgMainDemoLow TO CgMainDemoHigh
CgShuffleTestAray(v) = CgMainDemoCSTAMin + INT(RND * (CgMainDemoCSTAMax - CgMainDemoCSTAMin + 1 / 2))
NEXT
'* display results
FOR v = CgMainDemoLow TO CgMainDemoHigh
PRINT CgShuffleTestAray(v);
NEXT
CgShuffle CgShuffleTestAray(), CgMainDemoLow, CgMainDemoHigh, rounds
SUB CgShuffle (a() AS DOUBLE, start AS _INTEGER64, finish AS _INTEGER64, rounds AS INTEGER)
DIM cgsp AS _INTEGER64: cgsp = start
DIM cgsq AS _INTEGER64: cgsq = finish
DIM cgsr AS _INTEGER64: cgsr = cgsq - cgsp
DIM cgss AS _UNSIGNED _INTEGER64
DIM cgsh AS _INTEGER64
DO WHILE cgsr > 0
cgss = rounds
DO
cgsh = cgsp + INT(RND * cgsr) + 1
IF a(cgsp) <> a(cgsh) THEN
SWAP a(cgsp), a(cgsh)
EXIT DO
END IF
cgss = cgss - 1
LOOP UNTIL cgss < 1
cgsr = cgsr - 1
cgsp = cgsq - cgsr
LOOP
END SUB
-
Isn't the easiest way to "shuffle" an array simply:
FOR I = 0 TO UBOUND(ARRAY)
SWAP ARRAY(I), ARRAY(INT(RND * UBOUND(ARRAY))) + 1
NEXT
Change each array element with another random array element, and call it "shuffled".
-
When the array is highly repetitive, there is a strong likelihood you are swapping like values, which in essence isn't a suitable shuffle and a time-waster, especially when preparing sorted data for building a binary search tree (even using QuickSort or TreeSort). This modification of the Knuth Shuffle virtually eliminates worst-case O(n^2) performance while the penalty exacted is minimal. You don't want to build a BST or perform either of those sorts with already-sorted data.
-
When the array is highly repetitive, there is a strong likelihood you are swapping like values, which in essence isn't a suitable shuffle and a time-waster, especially when preparing sorted data for building a binary search tree (even using QuickSort or TreeSort). This modification of the Knuth Shuffle virtually eliminates worst-case O(n^2) performance while the penalty exacted is minimal. You don't want to build a BST or perform either of those sorts with already-sorted data.
I don't see anything wrong with swapping 2 like values. It's just as fast to swap 2 values as it is to condition check repeatively until you can swap them with a different value. Swap each element once, with another random element, and be done with it. I'd need to see some timed examples to showcase why there's any true reason for more complicated code. Maybe it's just the stuff I normally code personally, but I've never had a single-pass shuffle routine with any time to completion issues. Can you share some examples where a more complex routine makes a significant difference in perform?
-
Yeah this one is interesting - what are we using for a definition of "shuffled" when the uniqueness in the set is very sparse? I see both points but can't decide where I sit on this yet.
And if we have no actual definition, you could just use the log of the multiplicity. (i.e. dimensionless entropy)
*Is* there a solid definition of "shuffled" that you're working with codeguy?
-
Google "best shuffle algorithm"
"https://en.wikipedia.org/wiki/Fisher–Yates_shuffle" <<< copy / paste the whole contents between quotes
You will see Steve is still advocating the Naive Method.
Here is an old demo: https://www.qb64.org/forum/index.php?topic=1511.msg110457#msg110457