Author Topic: Binary Search Method (Read 18354 times)

SMcNeill · « **on:** January 24, 2019, 08:09:35 pm »

A simple demostration of how to implement and use a binary search method:

Code: QB64: [Select]

SCREEN _NEWIMAGE(800, 600, 32)
 
OPEN "466544 Word List.txt" FOR BINARY AS #1
DIM SHARED WordList(1 TO 466544) AS STRING
DO UNTIL EOF(1)
    count = count + 1
    LINE INPUT #1, WordList(count)
LOOP
 
 
 
DO
    _KEYCLEAR
    INPUT "Give me any word"; search$
    search$ = LTRIM$(RTRIM$(search$))
    PRINT "Searching for "; search$
    IF search$ = "" THEN END
    index = FindIndex(search$)
    IF index THEN
        PRINT "Word was found at position "; index; " in "; SearchTimes; "passes."
    ELSE
        PRINT "Word was not in list."
        PRINT "Previous word was ==> "; WordList(LastIndex - 1)
        PRINT "Next word was ======> "; WordList(LastIndex + 1)
        PRINT "Search took"; SearchTimes; "passes."
    END IF
    PRINT
LOOP
 
 
 
FUNCTION FindIndex (search$)
    SHARED SearchTimes, LastIndex
    SearchTimes = 0
    min = 1 'lbound(wordlist)
    max = 370099 'ubound(wordlist)
    DO UNTIL found
        SearchTimes = SearchTimes + 1
        gap = (min + max) \ 2
        compare = _STRICMP(search$, WordList(gap))
        IF compare > 0 THEN
            min = gap + 1
        ELSEIF compare < 0 THEN
            max = gap - 1
        ELSE
            FindIndex = gap
            found = -1
        END IF
        IF max - min <= 1 THEN LastIndex = gap: found = -1 'it's not in the list
        PRINT min, max, search$, WordList(gap), compare
        SLEEP
    LOOP
END FUNCTION
 

The word list has 466544 words in it, so it takes a few seconds to load into memory. Searching for a word in the list however, and finding its index, is almost instantaneous. At *MOST*, this will make 19 passes to either find, or eliminate a word from the list.

Now, let's apply the same list to STx's hash tables as written here: https://www.qb64.org/forum/index.php?topic=1001.msg101972#msg101972

Code: QB64: [Select]

DIM SHARED HashTableSize AS LONG
HashTableSize = 300007 ' Best to use a big prime number. Bigger examples are 611953 and 1014729.
 
 
DIM Counter(300007) 'So we can count how deep some of the tables go
 
PRINT "Loading dictionary..."
 
OPEN "466544 Word List.txt" FOR BINARY AS #1
DO WHILE NOT EOF(1)
    LINE INPUT #1, a$
    d = HashFunc(a$) ' Calculate the hash value (array address) of the word on hand.
    Counter(d) = Counter(d) + 1
LOOP
CLOSE #1
 
FOR i = 1 TO 300007
    IF Counter(i) > max THEN max = Counter(i)
NEXT
 
PRINT "Lists go up to "; max; " levels deep."
 
 
PRINT "Done."
 
END
 
 
 
FUNCTION HashFunc (a AS STRING) ' input string
    DIM sum AS DOUBLE
    sum = HashTableSize
    FOR k = 1 TO LEN(a)
        sum = sum + k * COS(ASC(MID$(a, k, 1))) ' Without the linear factor of k, permutations have same hash values.
    NEXT
    sum = ABS(VAL(ReplaceSubString$(STR$(sum), ".", "")))
    sum = sum MOD HashTableSize
    HashFunc = sum
END FUNCTION
 
FUNCTION ReplaceSubString$ (a AS STRING, b AS STRING, c AS STRING)
    j = INSTR(a, b)
    IF j > 0 THEN
        r$ = LEFT$(a, j - 1) + c + ReplaceSubString$(RIGHT$(a, LEN(a) - j + 1 - LEN(b)), b, c)
    ELSE
        r$ = a
    END IF
    ReplaceSubString$ = r$
END FUNCTION

Notice that the hash table has entries which go up to 10 levels deep in it, so we first need to hash to the proper address and then check the list for up to 10 different entries before finding the right one.

19 loops MAX in this case for the Binary Search Method, verses 11 (hash + 10 levels) MAX for the Hash Table...

... so it seems as if the Hash Table wins in this case. (Also depending on how efficient the method is for looking up the 10 levels deep info is; some commands have more overhead than others.)

BUT...

Hash Tables require your data in memory PLUS the table itself being in memory.

All the Binary Search Method requires is for the data to be memory (and sorted). (And if memory constraints are a true issue, it works quite well directly with Random Access Files, as you can simply use the index number to reference them with GET #1, index, word$.)

Either way, it seems to be a nice improvement over Pete's old method which he talks about here: https://www.qb64.org/forum/index.php?topic=1001.msg101983#msg101983

And, as for Stx's question:

Quote

17 steps? 18 steps? Where did these numbers come from? How do I interpret them in O(n) notation?

That was covered here: https://www.qb64.org/forum/index.php?topic=1001.msg101981#msg101981

Quote

A max of N searches, where 2^N > Index. ;)

2^18 = 262,144
INDEX = 466544 words
2^19 = 524,288

So our max number of searches is 2^N > Index... Or 19 in this case.

Quote

I think the burden's on you to do the speed test m8 (for a large data set with lots of performance demand).

I think the above offers a fairly decent comparison of the two methods. Feel free to run them and actually time them in various performance tests if you want.

One large advantage to the binary search is lesser memory usage and simplicity to implement.

A large advantage to hash tables is the fact that your data doesn't need to be sorted to search properly.

Both seem equally useful to me, and neither should be completely dismissed out-of-hand. ;)

NOTE: Feel free to remove (or comment out) the following statements, if you want, in the code above. It's mainly there just to help highlight the search process.

Code: QB64: [Select]

        PRINT min, max, search$, WordList(gap), compare
        SLEEP

STxAxTIC · « **Reply #1 on:** January 24, 2019, 08:23:04 pm »

Nice and honest writeup.

An actual speed test will be to rattle off say 50,000 consecutive lookups. To say it "feels instaneous" for one word at a time is a bit pedestrian. Let's talk Pete into making the test.

EDIT: Just looked up the average conversion time for binary sort is O(log n). That solves the speed question. On binary sort's best day you have O(n), which is incidentally what you get on hash's worst day.

You're right about the memory argument. Good thing it's no longer 1973 and we have been cramming gigabytes into keychains since 2002. Hashing is safe, memory-wise. (Speed is still coveted though.)

SMcNeill · « **Reply #2 on:** January 24, 2019, 08:42:05 pm »

Code: QB64: [Select]

SCREEN _NEWIMAGE(800, 600, 32)
 
OPEN "466544 Word List.txt" FOR BINARY AS #1
DIM SHARED WordList(1 TO 466544) AS STRING
DO UNTIL EOF(1)
    count = count + 1
    LINE INPUT #1, WordList(count)
LOOP
 
DIM RandomWords(50000) AS STRING
FOR i = 1 TO 50000
    c = INT(RND * 466544) + 1
    RandomWords(i) = WordList(c)
NEXT
 
t# = TIMER
FOR i = 1 TO 50000
    index = FindIndex(RandomWords(i))
NEXT
PRINT USING "###.### seconds lookup"; TIMER - t#
 
END
 
 
 
 
DO
    _KEYCLEAR
    INPUT "Give me any word"; search$
    search$ = LTRIM$(RTRIM$(search$))
    PRINT "Searching for "; search$
    IF search$ = "" THEN END
    index = FindIndex(search$)
    IF index THEN
        PRINT "Word was found at position "; index; " in "; SearchTimes; "passes."
    ELSE
        PRINT "Word was not in list."
        PRINT "Previous word was ==> "; WordList(LastIndex - 1)
        PRINT "Next word was ======> "; WordList(LastIndex + 1)
        PRINT "Search took"; SearchTimes; "passes."
    END IF
    PRINT
LOOP
 
 
 
FUNCTION FindIndex (search$)
    SHARED SearchTimes, LastIndex
    SearchTimes = 0
    min = 1 'lbound(wordlist)
    max = 370099 'ubound(wordlist)
    DO UNTIL found
        SearchTimes = SearchTimes + 1
        gap = (min + max) \ 2
        compare = _STRICMP(search$, WordList(gap))
        IF compare > 0 THEN
            min = gap + 1
        ELSEIF compare < 0 THEN
            max = gap - 1
        ELSE
            FindIndex = gap
            found = -1
        END IF
        IF max - min <= 1 THEN LastIndex = gap: found = -1 'it's not in the list
        ' PRINT min, max, search$, WordList(gap), compare
        ' SLEEP
    LOOP
END FUNCTION

0.055 seconds on my PC to find 50,000 random words. Personally, I’d call that “almost instantaneous”, but what do I know? My naming sense sucks. LOL

Now you just need to speed test your method to lookup the same 50,000 words. (Which is why theres no RANDOMIZER TIMER in there.)

Pete · « **Reply #3 on:** January 24, 2019, 08:50:53 pm »

Hey I'm still over in the other thread with my new method that makes you guys look like losers! Sorry, I gave brain cells today at the Head Cross clinic and for a few hours after that procedure, I find myself channeling Ted.

Pete

SMcNeill · « **Reply #4 on:** January 24, 2019, 08:53:52 pm »

Nevermind. It’s not worth the effort. Use it or not; there it is.

Pete · « **Reply #5 on:** January 24, 2019, 09:12:57 pm »

Not alphabetizing a file for a binary search would be as stupid as not using the Dewey Decimal system for stacking books. The books in here somewhere, just start doing a shelf by shelf search. So I don't think we should be considering that as part of this discussion. This is about an organized word list vs a hash table, which alphabetizing in the hash method is of absolutely no consequence.

In my 25 year old example, the speed was adequate because I organized the data and sub-divided the list into different alphabetized folders. If I had to loop through 50,000 random words in a single list with the slower INPUT AS method, I'd still be waiting to see if I spelled dumb-ass correctly.

Pete

STxAxTIC · « **Reply #6 on:** January 24, 2019, 09:14:15 pm »

Steve, the big-O notation is designed to specifically not compare apples to oranges. Dicscrete calculus will be a lesson for another time... I just cant right now, my facepalm hurts.

SMcNeill · « **Reply #7 on:** January 24, 2019, 09:17:35 pm »

Use it or not; there it is.

I’d still love to see a similar speed test for your method. I’m curious how big a difference it could be.

STxAxTIC · « **Reply #8 on:** January 24, 2019, 09:24:47 pm »

It's open source. Have a ball. (Might wanna optimize it first.)

Pete · « **Reply #9 on:** January 24, 2019, 09:29:23 pm »

Quote from: STxAxTIC on January 24, 2019, 09:14:15 pm

Steve, the big-O notation is designed to specifically not compare apples to oranges. Dicscrete calculus will be a lesson for another time... I just cant right now, my facepalm hurts.

Dicscrete calculus, Bill? I think you'd better chose a search method and get that spell checker up and working for craps sake.

If you mean discrete calculus, is that one of those math courses offered at night for active singles? If so, what is the probably of completing the course without contracting a nasty case of standard deviations?

Pete :D

STxAxTIC · « **Reply #10 on:** January 24, 2019, 10:21:24 pm »

Haha, not sure who promised I was making a spell checker. The dictionizer thing I posted is still a working product... or... prototype. I consider the whole case closed pretty much.

bplus · « **Reply #11 on:** January 25, 2019, 12:16:04 am »

Steve there is a bug in your FindIndex function:

Code: QB64: [Select]

SCREEN _NEWIMAGE(800, 600, 32)
 
OPEN "466544 Word List.txt" FOR BINARY AS #1
DIM SHARED WordList(1 TO 466544) AS STRING
DO UNTIL EOF(1)
    count = count + 1
    LINE INPUT #1, WordList(count)
LOOP
CLOSE #1
DIM RandomWords(50000) AS STRING
FOR i = 1 TO 50000
    c = INT(RND * 466544) + 1
    RandomWords(i) = WordList(c)
NEXT
PRINT "Steve's 50000 word lookup with binary search"
t# = TIMER
FOR i = 1 TO 10
    index = FindIndex(RandomWords(i))
    IF index THEN PRINT WordList(index) ELSE PRINT RandomWords(i) + " not found!"
NEXT
PRINT USING "###.### seconds lookup"; TIMER - t#
 
 
'pete's method
OPEN "466544 Word List.txt" FOR BINARY AS #1
word$ = SPACE$(LOF(1))
GET #1, , word$
CLOSE #1
PRINT "Pete's 50000 word lookup using INSTR to get position in string of word."
t# = TIMER
FOR i = 1 TO 10
    place = INSTR(word$, RandomWords(i) + CHR$(13) + CHR$(10))
    PRINT MID$(word$, place, LEN(RandomWords(i)))
NEXT
PRINT USING "####.### seconds lookup"; TIMER - t#
 
END
 
 
 
 
DO
    _KEYCLEAR
    INPUT "Give me any word"; search$
    search$ = LTRIM$(RTRIM$(search$))
    PRINT "Searching for "; search$
    IF search$ = "" THEN END
    index = FindIndex(search$)
    IF index THEN
        PRINT "Word was found at position "; index; " in "; SearchTimes; "passes."
    ELSE
        PRINT "Word was not in list."
        PRINT "Previous word was ==> "; WordList(LastIndex - 1)
        PRINT "Next word was ======> "; WordList(LastIndex + 1)
        PRINT "Search took"; SearchTimes; "passes."
    END IF
    PRINT
LOOP
 
 
 
FUNCTION FindIndex (search$)
    SHARED SearchTimes, LastIndex
    SearchTimes = 0
    min = 1 'lbound(wordlist)
    max = 370099 'ubound(wordlist)
    DO UNTIL found
        SearchTimes = SearchTimes + 1
        gap = (min + max) \ 2
        compare = _STRICMP(search$, WordList(gap))
        IF compare > 0 THEN
            min = gap + 1
        ELSEIF compare < 0 THEN
            max = gap - 1
        ELSE
            FindIndex = gap
            found = -1
        END IF
        IF max - min <= 1 THEN LastIndex = gap: found = -1 'it's not in the list
        ' PRINT min, max, search$, WordList(gap), compare
        ' SLEEP
    LOOP
END FUNCTION
 
 

SMcNeill · « **Reply #12 on:** January 25, 2019, 02:08:45 am »

A couple of little things was working against the original running as intended:

Code: QB64: [Select]

SCREEN _NEWIMAGE(800, 600, 32)
 
OPEN "466544 Word List.txt" FOR BINARY AS #1
DIM SHARED WordList(466545) AS STRING
PRINT "Loading library"
DO UNTIL EOF(1)
    count = count + 1
    LINE INPUT #1, WordList(count)
LOOP
CLOSE #1
 
PRINT "Sorting"
Sort WordList()
 
PRINT "Looking up"
DIM RandomWords(50000) AS STRING
FOR i = 1 TO 50000
    c = INT(RND * 466544) + 1
    RandomWords(i) = WordList(c)
NEXT
PRINT "Steve's 50000 word lookup with binary search"
t# = TIMER
FOR i = 1 TO 10
    index = FindIndex(RandomWords(i))
    PRINT "Searching for: "; RandomWords(i),
    IF index THEN PRINT WordList(index) ELSE PRINT "NOT FOUND!"
NEXT
PRINT USING "###.### seconds lookup"; TIMER - t#
 
 
'pete's method
 
'FOR i = 1 TO 466545
'    adding brackets to words to stop a false match, such as "dog" being found in "dogfood"
'    word$ = word$ + "{" + WordList(i) + "}" 'build up the search list
'    IF i < 50001 THEN RandomWords(i) = "{" + RandomWords(i) + "}"
'NEXT
 
OPEN "466544 Word List.txt" FOR BINARY AS #1
word$ = SPACE$(LOF(1))
GET #1, , word$
CLOSE #1
 
 
 
PRINT "Pete's 50000 word lookup using INSTR to get position in string of word."
t# = TIMER
FOR i = 1 TO 10
    place = INSTR(word$, RandomWords(i) + CHR$(13) + CHR$(10))
    PRINT "Searching for: "; RandomWords(i),
    IF place THEN PRINT MID$(word$, place, LEN(RandomWords(i))) ELSE PRINT "NOT FOUND!"
NEXT
PRINT USING "####.### seconds lookup"; TIMER - t#
 
END
 
 
 
 
DO
    _KEYCLEAR
    INPUT "Give me any word"; search$
    search$ = LTRIM$(RTRIM$(search$))
    PRINT "Searching for "; search$
    IF search$ = "" THEN END
    index = FindIndex(search$)
    IF index THEN
        PRINT WordList(index); " was found at position "; index; " in "; SearchTimes; "passes."
    ELSE
        PRINT "Word was not in list."
        PRINT "Previous word was ==> "; WordList(LastIndex - 1)
        PRINT "Next word was ======> "; WordList(LastIndex + 1)
        PRINT "Search took"; SearchTimes; "passes."
    END IF
    PRINT
LOOP
 
 
 
FUNCTION FindIndex (search$)
    SHARED SearchTimes, LastIndex
    SearchTimes = 0
    min = 1 'lbound(wordlist)
    max = 466544 'ubound(wordlist)
 
    DO UNTIL found
        SearchTimes = SearchTimes + 1
        gap = (max + min) \ 2
        'IF gap = oldgap THEN gap = gap + 1
        compare = _STRCMP(search$, WordList(gap))
        IF compare > 0 THEN
            oldmin = min
            min = gap
        ELSEIF compare < 0 THEN
            oldmax = max
            max = gap
        ELSE
            FindIndex = gap
            found = -1
            EXIT FUNCTION
        END IF
        oldgap = gap
        IF max - min < 1 THEN LastIndex = gap: found = -1 'it's not in the list
        ' PRINT min, max, search$, WordList(gap), compare
        ' SLEEP
    LOOP
END FUNCTION
 
SUB Sort (Array() AS STRING)
    'The dice sorting routine, optimized to use _MEM and a comb sort algorithm.
    'It's more than fast enough for our needs here I th ink.  ;)
    gap = UBOUND(array)
    DO
        gap = 10 * gap \ 13
        IF gap < 1 THEN gap = 1
        i = 0
        swapped = 0
        DO
            IF _STRCMP(Array(i), Array(i + gap)) > 0 THEN
                SWAP Array(i), Array(i + gap)
                swapped = -1
            END IF
            i = i + 1
        LOOP UNTIL i + gap > UBOUND(Array)
    LOOP UNTIL swapped = 0 AND gap = 1
END SUB
 

First issue: The data in the wordlist isn't fully alphabetical... O_o!! It needed a quick sort so we can search it properly. Just look at the first few lines; it should've been obvious, but I overlooked it:

Quote

2
1080
&c
10-point
10th
11-point
12-point
16-point
18-point
1st

Since when does "2" come before "1"??

Sorting routine added, and bug was squashed.

Glitch 2:

OPEN "466544 Word List.txt" FOR BINARY AS #1
max = 370099 'ubound(wordlist)

There's a small difference in those two numbers... (I'd originally was using a smaller file, before I found this one on the web and grabbed it for the larger data set.)

There's no way we could find a ton of words in the list, while only checking about 3/4 of them...

Glitch fixed.

Glitch #3:

ELSE
FindIndex = gap
found = -1
END IF
IF max - min <= 1 THEN LastIndex = gap: found = -1 'it's not in the list

We're missing one important piece of program flow here:

ELSE
FindIndex = gap
found = -1
EXIT FUNCTION
END IF
IF max - min < 1 THEN LastIndex = gap: found = -1 'it's not in the list

We'd find the proper result, and then if the gap was small enough between the last numbers, we'd lie and claim it wasn't actually in the list...

Glitch fixed.

All should be working as advertised now, in the code above. ;)

Sidenote: Your implementation of Pete's method is glitched. "dog" would be found if "dogfood" was included in the list, while "dog" actually wasn't.

I was going to do a fix for the issue, but you'd need to take a nap while it was running:

Code: QB64: [Select]

FOR i = 1 TO 466545 ' adding brackets to words to stop a false match, such as "dog" being found in "dogfood"
    word$ = word$ + "{" + WordList(i) + "}" 'build up the search list
    IF i < 50001 THEN RandomWords(i) = "{" + RandomWords(i) + "}"
NEXT

I tried to rebuild the already loaded and sorted list with brackets delimiting the words, but I didn't have the patience to sit and listen to my PC try to sound like an airplane with all the fans going full throttle for who knows how long....

luke · « **Reply #13 on:** January 25, 2019, 05:42:29 am »

And just because we can, here's a recursive binary sort:

Code: QB64: [Select]

DEFLNG A-Z
 
OPEN "words" FOR INPUT AS #1
DIM SHARED WordList(99170) AS STRING
DO UNTIL EOF(1)
    LINE INPUT #1, WordList(count)
    count = count + 1
LOOP
 
DO
    INPUT "> ", search$
    IF search$ = "" THEN END
    index = find(search$, 0, UBOUND(wordlist))
    IF index >= 0 THEN
        PRINT "Found "; WordList(index); " at"; index
    ELSE
        PRINT "Not found"
    END IF
LOOP
 
FUNCTION find (word$, start, finish)
    size = finish - start + 1
    mid = size \ 2
    cmp = _STRCMP(WordList(start + mid), word$)
    SELECT CASE cmp
        CASE 0
            find = start + mid
        CASE IS > 0
            IF size = 1 THEN find = -1 ELSE find = find(word$, start, start + mid - 1)
        CASE IS < 0
            IF size = 1 THEN find = -1 ELSE find = find(word$, start + mid + 1, finish)
    END SELECT
END FUNCTION

(My word list had all the words beginning with a capital letter listed first, so I'm using _STRCMP)

SMcNeill · « **Reply #14 on:** January 25, 2019, 09:18:49 am »

Quote from: SMcNeill on January 24, 2019, 09:17:35 pm

Use it or not; there it is.

I’d still love to see a similar speed test for your method. I’m curious how big a difference it could be.

Quote from: STxAxTIC on January 24, 2019, 09:24:47 pm

It's open source. Have a ball. (Might wanna optimize it first.)

So, since Stx wasn't willing to do a speed comparison, I did:

Code: QB64: [Select]

DIM SHARED HashTableSize AS LONG
HashTableSize = 300007 ' Best to use a big prime number. Bigger examples are 611953 and 1014729.
 
DIM SHARED LB AS STRING ' Make sure that bcracketing sequences do not appear in the data source, otherwise use (a) special character(s).
DIM SHARED RB AS STRING
LB = "{"
RB = "}"
 
DIM SHARED EnglishDictionary(HashTableSize) AS STRING ' Hash table size does not need to equal the size of the source dictionary itself.
 
OPEN "466544 Word List.txt" FOR BINARY AS #1
 
DIM SHARED WordList(466545) AS STRING
PRINT "Loading library"
DO UNTIL EOF(1)
    count = count + 1
    LINE INPUT #1, WordList(count)
LOOP
CLOSE #1
 
Sort WordList()
 
i = 0
FOR i = 1 TO 466545
    b$ = WordList(i) 'the word to store
    c$ = LTRIM$(RTRIM$(STR$(i))) 'to store the index
    d = HashFunc(b$) ' Calculate the hash value (array address) of the word on hand.
    EnglishDictionary(d) = EnglishDictionary(d) + LB + b$ + RB + LB + c$ + RB
NEXT
CLOSE #1
PRINT "Done creating Hash Table."
 
' Done developing fast lookup tool. Now time for an application.
 
PRINT "Looking up"
DIM RandomWords(50000) AS STRING
FOR i = 1 TO 50000
    c = INT(RND * 466544) + 1
    RandomWords(i) = WordList(c)
NEXT
 
t# = TIMER
FOR i = 1 TO 50000
    'PRINT "Searching for: "; RandomWords(i),
    l$ = Lookup$(RandomWords(i))
    IF l$ <> "" THEN
        l = INSTR(l$, " ")
        word$ = LEFT$(l$, l - 1)
        index = VAL(MID$(l$, l + 1))
        'PRINT WordList(index) 'to show that we got the index back successfully
        'ELSE
        'PRINT "NOT FOUND!"
    END IF
NEXT
PRINT USING "###.### seconds lookup using Hash Table"; TIMER - t#
 
t# = TIMER
FOR i = 1 TO 50000
    index = FindIndex(RandomWords(i))
    'PRINT "Searching for: "; RandomWords(i),
    'IF index THEN 'to show that we got the index back successfully
    '    PRINT WordList(index)
    'ELSE
    '    PRINT "NOT FOUND!"
    'END IF
NEXT
PRINT USING "###.### seconds lookup using Binary Search"; TIMER - t#
 
PRINT
PRINT "And just to compare results -- the first 15 words:"
FOR i = 1 TO 10
    'PRINT "Searching for: "; RandomWords(i),
    l$ = Lookup$(RandomWords(i))
    IF l$ <> "" THEN
        l = INSTR(l$, " ")
        word$ = LEFT$(l$, l - 1)
        index = VAL(MID$(l$, l + 1))
        PRINT WordList(index), 'to show that we got the index back successfully
    ELSE
        PRINT "NOT FOUND!",
    END IF
    index = FindIndex(RandomWords(i))
    'PRINT "Searching for: "; RandomWords(i),
    IF index THEN 'to show that we got the index back successfully
        PRINT WordList(index)
    ELSE
        PRINT "NOT FOUND!"
    END IF
NEXT
 
 
 
 
FUNCTION Lookup$ (a AS STRING)
    r$ = ""
    b$ = EnglishDictionary(HashFunc(a))
    c$ = ""
    d$ = ""
    IF b$ <> "" THEN
        DO WHILE c$ <> a
            c$ = ReturnBetween(b$, LB, RB)
            IF c$ = "" THEN EXIT DO
            b$ = RIGHT$(b$, LEN(b$) - LEN(LB + c$ + RB))
            d$ = ReturnBetween(b$, LB, RB)
        LOOP
    END IF
    r$ = a + "  " + d$
    Lookup$ = r$
END FUNCTION
 
FUNCTION ReturnBetween$ (a AS STRING, b AS STRING, c AS STRING) ' input string, left bracket, right bracket
    i = INSTR(a, b)
    j = INSTR(a, c)
    f = LEN(c)
    ReturnBetween$ = MID$(a, i + f, j - (i + f))
END FUNCTION
 
FUNCTION HashFunc (a AS STRING) ' input string
    DIM sum AS DOUBLE
    sum = HashTableSize
    FOR k = 1 TO LEN(a)
        sum = sum + k * COS(ASC(MID$(a, k, 1))) ' Without the linear factor of k, permutations have same hash values.
    NEXT
    sum = ABS(VAL(ReplaceSubString$(STR$(sum), ".", "")))
    sum = sum MOD HashTableSize
    HashFunc = sum
END FUNCTION
 
FUNCTION ReplaceSubString$ (a AS STRING, b AS STRING, c AS STRING)
    j = INSTR(a, b)
    IF j > 0 THEN
        r$ = LEFT$(a, j - 1) + c + ReplaceSubString$(RIGHT$(a, LEN(a) - j + 1 - LEN(b)), b, c)
    ELSE
        r$ = a
    END IF
    ReplaceSubString$ = r$
END FUNCTION
 
 
SUB Sort (Array() AS STRING)
    'The dice sorting routine, optimized to use _MEM and a comb sort algorithm.
    'It's more than fast enough for our needs here I th ink.  ;)
    gap = UBOUND(array)
    DO
        gap = 10 * gap \ 13
        IF gap < 1 THEN gap = 1
        i = 0
        swapped = 0
        DO
            IF _STRCMP(Array(i), Array(i + gap)) > 0 THEN
                SWAP Array(i), Array(i + gap)
                swapped = -1
            END IF
            i = i + 1
        LOOP UNTIL i + gap > UBOUND(Array)
    LOOP UNTIL swapped = 0 AND gap = 1
END SUB
 
FUNCTION FindIndex (search$)
    SHARED SearchTimes, LastIndex
    SearchTimes = 0
    min = 1 'lbound(wordlist)
    max = 466544 'ubound(wordlist)
 
    DO UNTIL found
        SearchTimes = SearchTimes + 1
        gap = (max + min) \ 2
        compare = _STRCMP(search$, WordList(gap))
        IF compare > 0 THEN
            min = gap
        ELSEIF compare < 0 THEN
            max = gap
        ELSE
            FindIndex = gap
            found = -1
            EXIT FUNCTION
        END IF
        IF max - min < 1 THEN LastIndex = gap: found = -1 'it's not in the list
        ' PRINT min, max, search$, WordList(gap), compare
        ' SLEEP
    LOOP
END FUNCTION
 
 

In both search instances, we use the same word list, the same 50,000 word search list, and we return both the word we're searching for and its index position in the list.

Now, I imagine that Stx will come along and tell me I'm doing something wrong with the routine he provided, but the screenshot below is the results I get. If there's something wrong with things, I'm more than happy to learn from whatever the issue is:

News:

Author Topic: Binary Search Method (Read 18354 times)

SMcNeill

Binary Search Method

STxAxTIC

Re: Binary Search Method

SMcNeill

Re: Binary Search Method

Pete

Re: Binary Search Method

SMcNeill

Re: Binary Search Method

Pete

Re: Binary Search Method

STxAxTIC

Re: Binary Search Method

SMcNeill

Re: Binary Search Method

STxAxTIC

Re: Binary Search Method

Pete

Re: Binary Search Method

STxAxTIC

Re: Binary Search Method

bplus

Re: Binary Search Method

SMcNeill

Re: Binary Search Method

luke

Re: Binary Search Method

SMcNeill

Re: Binary Search Method