A simple demostration of how to implement and use a binary search method:
count = count + 1
INPUT "Give me any word"; search$
PRINT "Searching for "; search$
index = FindIndex(search$)
PRINT "Word was found at position "; index;
" in "; SearchTimes;
"passes." PRINT "Word was not in list." PRINT "Previous word was ==> "; WordList
(LastIndex
- 1) PRINT "Next word was ======> "; WordList
(LastIndex
+ 1) PRINT "Search took"; SearchTimes;
"passes."
SearchTimes = 0
min = 1 'lbound(wordlist)
max = 370099 'ubound(wordlist)
SearchTimes = SearchTimes + 1
gap = (min + max) \ 2
compare
= _STRICMP(search$
, WordList
(gap
)) min = gap + 1
max = gap - 1
FindIndex = gap
found = -1
IF max
- min
<= 1 THEN LastIndex
= gap: found
= -1 'it's not in the list PRINT min
, max
, search$
, WordList
(gap
), compare
The word list has 466544 words in it, so it takes a few seconds to load into memory. Searching for a word in the list however, and finding its index, is almost instantaneous. At *MOST*, this will make 19 passes to either find, or eliminate a word from the list.
Now, let's apply the same list to STx's hash tables as written here:
https://www.qb64.org/forum/index.php?topic=1001.msg101972#msg101972HashTableSize = 300007 ' Best to use a big prime number. Bigger examples are 611953 and 1014729.
DIM Counter
(300007) 'So we can count how deep some of the tables go
PRINT "Loading dictionary..."
d = HashFunc(a$) ' Calculate the hash value (array address) of the word on hand.
Counter(d) = Counter(d) + 1
IF Counter
(i
) > max
THEN max
= Counter
(i
)
PRINT "Lists go up to "; max;
" levels deep."
sum = HashTableSize
sum
= sum
+ k
* COS(ASC(MID$(a
, k
, 1))) ' Without the linear factor of k, permutations have same hash values. sum
= ABS(VAL(ReplaceSubString$
(STR$(sum
), ".", ""))) sum
= sum
MOD HashTableSize
HashFunc = sum
r$
= LEFT$(a
, j
- 1) + c
+ ReplaceSubString$
(RIGHT$(a
, LEN(a
) - j
+ 1 - LEN(b
)), b
, c
) r$ = a
ReplaceSubString$ = r$
Notice that the hash table has entries which go up to 10 levels deep in it, so we first need to hash to the proper address and then check the list for up to 10 different entries before finding the right one.
19 loops MAX in this case for the Binary Search Method, verses 11 (hash + 10 levels) MAX for the Hash Table...
... so it seems as if the Hash Table wins in this case. (Also depending on how efficient the method is for looking up the 10 levels deep info is; some commands have more overhead than others.)
BUT...
Hash Tables require your data in memory PLUS the table itself being in memory.
All the Binary Search Method requires is for the data to be memory (and sorted). (And if memory constraints are a true issue, it works quite well directly with Random Access Files, as you can simply use the index number to reference them with GET #1, index, word$.)
Either way, it seems to be a nice improvement over Pete's old method which he talks about here:
https://www.qb64.org/forum/index.php?topic=1001.msg101983#msg101983And, as for Stx's question:
17 steps? 18 steps? Where did these numbers come from? How do I interpret them in O(n) notation?
That was covered here:
https://www.qb64.org/forum/index.php?topic=1001.msg101981#msg101981A max of N searches, where 2^N > Index. ;)
2^18 = 262,144
INDEX = 466544 words
2^19 = 524,288
So our max number of searches is 2^N > Index... Or 19 in this case.
I think the burden's on you to do the speed test m8 (for a large data set with lots of performance demand).
I think the above offers a fairly decent comparison of the two methods. Feel free to run them and actually time them in various performance tests if you want.
One large advantage to the binary search is lesser memory usage and simplicity to implement.
A large advantage to hash tables is the fact that your data doesn't need to be sorted to search properly.
Both seem equally useful to me, and neither should be completely dismissed out-of-hand. ;)
NOTE: Feel free to remove (or comment out) the following statements, if you want, in the code above. It's mainly there just to help highlight the search process.
PRINT min
, max
, search$
, WordList
(gap
), compare