QB64.org Forum

Active Forums => Programs => Topic started by: SMcNeill on January 24, 2019, 08:09:35 pm

Title: Binary Search Method
Post by: SMcNeill on January 24, 2019, 08:09:35 pm
A simple demostration of how to implement and use a binary search method:

Code: QB64: [Select]
  1. SCREEN _NEWIMAGE(800, 600, 32)
  2.  
  3. OPEN "466544 Word List.txt" FOR BINARY AS #1
  4. DIM SHARED WordList(1 TO 466544) AS STRING
  5.     count = count + 1
  6.     LINE INPUT #1, WordList(count)
  7.  
  8.  
  9.  
  10.     _KEYCLEAR
  11.     INPUT "Give me any word"; search$
  12.     search$ = LTRIM$(RTRIM$(search$))
  13.     PRINT "Searching for "; search$
  14.     IF search$ = "" THEN END
  15.     index = FindIndex(search$)
  16.     IF index THEN
  17.         PRINT "Word was found at position "; index; " in "; SearchTimes; "passes."
  18.     ELSE
  19.         PRINT "Word was not in list."
  20.         PRINT "Previous word was ==> "; WordList(LastIndex - 1)
  21.         PRINT "Next word was ======> "; WordList(LastIndex + 1)
  22.         PRINT "Search took"; SearchTimes; "passes."
  23.     END IF
  24.     PRINT
  25.  
  26.  
  27.  
  28. FUNCTION FindIndex (search$)
  29.     SHARED SearchTimes, LastIndex
  30.     SearchTimes = 0
  31.     min = 1 'lbound(wordlist)
  32.     max = 370099 'ubound(wordlist)
  33.     DO UNTIL found
  34.         SearchTimes = SearchTimes + 1
  35.         gap = (min + max) \ 2
  36.         compare = _STRICMP(search$, WordList(gap))
  37.         IF compare > 0 THEN
  38.             min = gap + 1
  39.         ELSEIF compare < 0 THEN
  40.             max = gap - 1
  41.         ELSE
  42.             FindIndex = gap
  43.             found = -1
  44.         END IF
  45.         IF max - min <= 1 THEN LastIndex = gap: found = -1 'it's not in the list
  46.         PRINT min, max, search$, WordList(gap), compare
  47.         SLEEP
  48.     LOOP
  49.  

The word list has 466544 words in it, so it takes a few seconds to load into memory.  Searching for a word in the list however, and finding its index, is almost instantaneous.  At *MOST*, this will make 19 passes to either find, or eliminate a word from the list.

Now, let's apply the same list to STx's hash tables as written here: https://www.qb64.org/forum/index.php?topic=1001.msg101972#msg101972

Code: QB64: [Select]
  1. DIM SHARED HashTableSize AS LONG
  2. HashTableSize = 300007 ' Best to use a big prime number. Bigger examples are 611953 and 1014729.
  3.  
  4.  
  5. DIM Counter(300007) 'So we can count how deep some of the tables go
  6.  
  7. PRINT "Loading dictionary..."
  8.  
  9. OPEN "466544 Word List.txt" FOR BINARY AS #1
  10.     LINE INPUT #1, a$
  11.     d = HashFunc(a$) ' Calculate the hash value (array address) of the word on hand.
  12.     Counter(d) = Counter(d) + 1
  13.  
  14. FOR i = 1 TO 300007
  15.     IF Counter(i) > max THEN max = Counter(i)
  16.  
  17. PRINT "Lists go up to "; max; " levels deep."
  18.  
  19.  
  20. PRINT "Done."
  21.  
  22.  
  23.  
  24.  
  25. FUNCTION HashFunc (a AS STRING) ' input string
  26.     DIM sum AS DOUBLE
  27.     sum = HashTableSize
  28.     FOR k = 1 TO LEN(a)
  29.         sum = sum + k * COS(ASC(MID$(a, k, 1))) ' Without the linear factor of k, permutations have same hash values.
  30.     NEXT
  31.     sum = ABS(VAL(ReplaceSubString$(STR$(sum), ".", "")))
  32.     sum = sum MOD HashTableSize
  33.     HashFunc = sum
  34.  
  35. FUNCTION ReplaceSubString$ (a AS STRING, b AS STRING, c AS STRING)
  36.     j = INSTR(a, b)
  37.     IF j > 0 THEN
  38.         r$ = LEFT$(a, j - 1) + c + ReplaceSubString$(RIGHT$(a, LEN(a) - j + 1 - LEN(b)), b, c)
  39.     ELSE
  40.         r$ = a
  41.     END IF
  42.     ReplaceSubString$ = r$

Notice that the hash table has entries which go up to 10 levels deep in it, so we first need to hash to the proper address and then check the list for up to 10 different entries before finding the right one.

19 loops MAX in this case for the Binary Search Method, verses 11 (hash + 10 levels) MAX for the Hash Table...

... so it seems as if the Hash Table wins in this case. (Also depending on how efficient the method is for looking up the 10 levels deep info is; some commands have more overhead than others.)

BUT...

Hash Tables require your data in memory PLUS the table itself being in memory. 

All the Binary Search Method requires is for the data to be memory (and sorted).  (And if memory constraints are a true issue, it works quite well directly with Random Access Files, as you can simply use the index number to reference them with GET #1, index, word$.)



Either way, it seems to be a nice improvement over Pete's old method which he talks about here: https://www.qb64.org/forum/index.php?topic=1001.msg101983#msg101983

And, as for Stx's question:
Quote
17 steps? 18 steps? Where did these numbers come from? How do I interpret them in O(n) notation?

That was covered here: https://www.qb64.org/forum/index.php?topic=1001.msg101981#msg101981

Quote
A max of N searches, where 2^N > Index.  ;)

2^18 = 262,144
INDEX = 466544 words
2^19 = 524,288

So our max number of searches is 2^N > Index... Or 19 in this case. 

Quote
I think the burden's on you to do the speed test m8 (for a large data set with lots of performance demand).

I think the above offers a fairly decent comparison of the two methods.  Feel free to run them and actually time them in various performance tests if you want. 

One large advantage to the binary search is lesser memory usage and simplicity to implement.

A large advantage to hash tables is the fact that your data doesn't need to be sorted to search properly.

Both seem equally useful to me, and neither should be completely dismissed out-of-hand.  ;)

NOTE: Feel free to remove (or comment out) the following statements, if you want, in the code above.  It's mainly there just to help highlight the search process.

Code: QB64: [Select]
  1.         PRINT min, max, search$, WordList(gap), compare
  2.         SLEEP

Title: Re: Binary Search Method
Post by: STxAxTIC on January 24, 2019, 08:23:04 pm
Nice and honest writeup.

An actual speed test will be to rattle off say 50,000 consecutive lookups. To say it "feels instaneous" for one word at a time is a bit pedestrian. Let's talk Pete into making the test.

EDIT: Just looked up the average conversion time for binary sort is O(log n). That solves the speed question. On binary sort's best day you have O(n), which is incidentally what you get on hash's worst day.

You're right about the memory argument. Good thing it's no longer 1973 and we have been cramming gigabytes into keychains since 2002. Hashing is safe, memory-wise.  (Speed is still coveted though.)
Title: Re: Binary Search Method
Post by: SMcNeill on January 24, 2019, 08:42:05 pm
Code: QB64: [Select]
  1. SCREEN _NEWIMAGE(800, 600, 32)
  2.  
  3. OPEN "466544 Word List.txt" FOR BINARY AS #1
  4. DIM SHARED WordList(1 TO 466544) AS STRING
  5.     count = count + 1
  6.     LINE INPUT #1, WordList(count)
  7.  
  8. DIM RandomWords(50000) AS STRING
  9. FOR i = 1 TO 50000
  10.     c = INT(RND * 466544) + 1
  11.     RandomWords(i) = WordList(c)
  12.  
  13. t# = TIMER
  14. FOR i = 1 TO 50000
  15.     index = FindIndex(RandomWords(i))
  16. PRINT USING "###.### seconds lookup"; TIMER - t#
  17.  
  18.  
  19.  
  20.  
  21.  
  22.     _KEYCLEAR
  23.     INPUT "Give me any word"; search$
  24.     search$ = LTRIM$(RTRIM$(search$))
  25.     PRINT "Searching for "; search$
  26.     IF search$ = "" THEN END
  27.     index = FindIndex(search$)
  28.     IF index THEN
  29.         PRINT "Word was found at position "; index; " in "; SearchTimes; "passes."
  30.     ELSE
  31.         PRINT "Word was not in list."
  32.         PRINT "Previous word was ==> "; WordList(LastIndex - 1)
  33.         PRINT "Next word was ======> "; WordList(LastIndex + 1)
  34.         PRINT "Search took"; SearchTimes; "passes."
  35.     END IF
  36.     PRINT
  37.  
  38.  
  39.  
  40. FUNCTION FindIndex (search$)
  41.     SHARED SearchTimes, LastIndex
  42.     SearchTimes = 0
  43.     min = 1 'lbound(wordlist)
  44.     max = 370099 'ubound(wordlist)
  45.     DO UNTIL found
  46.         SearchTimes = SearchTimes + 1
  47.         gap = (min + max) \ 2
  48.         compare = _STRICMP(search$, WordList(gap))
  49.         IF compare > 0 THEN
  50.             min = gap + 1
  51.         ELSEIF compare < 0 THEN
  52.             max = gap - 1
  53.         ELSE
  54.             FindIndex = gap
  55.             found = -1
  56.         END IF
  57.         IF max - min <= 1 THEN LastIndex = gap: found = -1 'it's not in the list
  58.         ' PRINT min, max, search$, WordList(gap), compare
  59.         ' SLEEP
  60.     LOOP

0.055 seconds on my PC to find 50,000 random words.  Personally, I’d call that “almost instantaneous”, but what do I know?  My naming sense sucks.  LOL

Now you just need to speed test your method to lookup the same 50,000 words.  (Which is why theres no RANDOMIZER TIMER in there.)
Title: Re: Binary Search Method
Post by: Pete on January 24, 2019, 08:50:53 pm
Hey I'm still over in the other thread with my new method that makes you guys look like losers! Sorry, I gave brain cells today at the Head Cross clinic and for a few hours after that procedure, I find myself channeling Ted.

Pete
Title: Re: Binary Search Method
Post by: SMcNeill on January 24, 2019, 08:53:52 pm
Nevermind.  It’s not worth the effort.  Use it or not; there it is.
Title: Re: Binary Search Method
Post by: Pete on January 24, 2019, 09:12:57 pm
Not alphabetizing a file for a binary search would be as stupid as not using the Dewey Decimal system for stacking books. The books in here somewhere, just start doing a shelf by shelf search. So I don't think we should be considering that as part of this discussion. This is about an organized word list vs a hash table, which alphabetizing in the hash method is of absolutely no consequence.

In my 25 year old example, the speed was adequate because I organized the data and sub-divided the list into different alphabetized folders. If I had to loop through 50,000 random words in a single list with the slower INPUT AS method, I'd still be waiting to see if I spelled dumb-ass correctly.

Pete
Title: Re: Binary Search Method
Post by: STxAxTIC on January 24, 2019, 09:14:15 pm
Steve, the big-O notation is designed to specifically not compare apples to oranges. Dicscrete calculus will be a lesson for another time... I just cant right now, my facepalm hurts.
Title: Re: Binary Search Method
Post by: SMcNeill on January 24, 2019, 09:17:35 pm
Use it or not; there it is.

I’d still love to see a similar speed test for your method.  I’m curious how big a difference it could be.
Title: Re: Binary Search Method
Post by: STxAxTIC on January 24, 2019, 09:24:47 pm
It's open source. Have a ball. (Might wanna optimize it first.)
Title: Re: Binary Search Method
Post by: Pete on January 24, 2019, 09:29:23 pm
Steve, the big-O notation is designed to specifically not compare apples to oranges. Dicscrete calculus will be a lesson for another time... I just cant right now, my facepalm hurts.

Dicscrete calculus, Bill? I think you'd better chose a search method and get that spell checker up and working for craps sake.

If you mean discrete calculus, is that one of those math courses offered at night for active singles? If so, what is the probably of completing the course without contracting a nasty case of standard deviations?

Pete :D
Title: Re: Binary Search Method
Post by: STxAxTIC on January 24, 2019, 10:21:24 pm
Haha, not sure who promised I was making a spell checker. The dictionizer thing I posted is still a working product... or... prototype. I consider the whole case closed pretty much.
Title: Re: Binary Search Method
Post by: bplus on January 25, 2019, 12:16:04 am
Steve there is a bug in your FindIndex function:
Code: QB64: [Select]
  1. SCREEN _NEWIMAGE(800, 600, 32)
  2.  
  3. OPEN "466544 Word List.txt" FOR BINARY AS #1
  4. DIM SHARED WordList(1 TO 466544) AS STRING
  5.     count = count + 1
  6.     LINE INPUT #1, WordList(count)
  7. DIM RandomWords(50000) AS STRING
  8. FOR i = 1 TO 50000
  9.     c = INT(RND * 466544) + 1
  10.     RandomWords(i) = WordList(c)
  11. PRINT "Steve's 50000 word lookup with binary search"
  12. t# = TIMER
  13. FOR i = 1 TO 10
  14.     index = FindIndex(RandomWords(i))
  15.     IF index THEN PRINT WordList(index) ELSE PRINT RandomWords(i) + " not found!"
  16. PRINT USING "###.### seconds lookup"; TIMER - t#
  17.  
  18.  
  19. 'pete's method
  20. OPEN "466544 Word List.txt" FOR BINARY AS #1
  21. word$ = SPACE$(LOF(1))
  22. GET #1, , word$
  23. PRINT "Pete's 50000 word lookup using INSTR to get position in string of word."
  24. t# = TIMER
  25. FOR i = 1 TO 10
  26.     place = INSTR(word$, RandomWords(i) + CHR$(13) + CHR$(10))
  27.     PRINT MID$(word$, place, LEN(RandomWords(i)))
  28. PRINT USING "####.### seconds lookup"; TIMER - t#
  29.  
  30.  
  31.  
  32.  
  33.  
  34.     _KEYCLEAR
  35.     INPUT "Give me any word"; search$
  36.     search$ = LTRIM$(RTRIM$(search$))
  37.     PRINT "Searching for "; search$
  38.     IF search$ = "" THEN END
  39.     index = FindIndex(search$)
  40.     IF index THEN
  41.         PRINT "Word was found at position "; index; " in "; SearchTimes; "passes."
  42.     ELSE
  43.         PRINT "Word was not in list."
  44.         PRINT "Previous word was ==> "; WordList(LastIndex - 1)
  45.         PRINT "Next word was ======> "; WordList(LastIndex + 1)
  46.         PRINT "Search took"; SearchTimes; "passes."
  47.     END IF
  48.     PRINT
  49.  
  50.  
  51.  
  52. FUNCTION FindIndex (search$)
  53.     SHARED SearchTimes, LastIndex
  54.     SearchTimes = 0
  55.     min = 1 'lbound(wordlist)
  56.     max = 370099 'ubound(wordlist)
  57.     DO UNTIL found
  58.         SearchTimes = SearchTimes + 1
  59.         gap = (min + max) \ 2
  60.         compare = _STRICMP(search$, WordList(gap))
  61.         IF compare > 0 THEN
  62.             min = gap + 1
  63.         ELSEIF compare < 0 THEN
  64.             max = gap - 1
  65.         ELSE
  66.             FindIndex = gap
  67.             found = -1
  68.         END IF
  69.         IF max - min <= 1 THEN LastIndex = gap: found = -1 'it's not in the list
  70.         ' PRINT min, max, search$, WordList(gap), compare
  71.         ' SLEEP
  72.     LOOP
  73.  
  74.  
 [ This attachment cannot be displayed inline in 'Print Page' view ]  
Title: Re: Binary Search Method
Post by: SMcNeill on January 25, 2019, 02:08:45 am
A couple of little things was working against the original running as intended:

Code: QB64: [Select]
  1. SCREEN _NEWIMAGE(800, 600, 32)
  2.  
  3. OPEN "466544 Word List.txt" FOR BINARY AS #1
  4. DIM SHARED WordList(466545) AS STRING
  5. PRINT "Loading library"
  6.     count = count + 1
  7.     LINE INPUT #1, WordList(count)
  8.  
  9. PRINT "Sorting"
  10. Sort WordList()
  11.  
  12. PRINT "Looking up"
  13. DIM RandomWords(50000) AS STRING
  14. FOR i = 1 TO 50000
  15.     c = INT(RND * 466544) + 1
  16.     RandomWords(i) = WordList(c)
  17. PRINT "Steve's 50000 word lookup with binary search"
  18. t# = TIMER
  19. FOR i = 1 TO 10
  20.     index = FindIndex(RandomWords(i))
  21.     PRINT "Searching for: "; RandomWords(i),
  22.     IF index THEN PRINT WordList(index) ELSE PRINT "NOT FOUND!"
  23. PRINT USING "###.### seconds lookup"; TIMER - t#
  24.  
  25.  
  26. 'pete's method
  27.  
  28. 'FOR i = 1 TO 466545
  29. '    adding brackets to words to stop a false match, such as "dog" being found in "dogfood"
  30. '    word$ = word$ + "{" + WordList(i) + "}" 'build up the search list
  31. '    IF i < 50001 THEN RandomWords(i) = "{" + RandomWords(i) + "}"
  32. 'NEXT
  33.  
  34. OPEN "466544 Word List.txt" FOR BINARY AS #1
  35. word$ = SPACE$(LOF(1))
  36. GET #1, , word$
  37.  
  38.  
  39.  
  40. PRINT "Pete's 50000 word lookup using INSTR to get position in string of word."
  41. t# = TIMER
  42. FOR i = 1 TO 10
  43.     place = INSTR(word$, RandomWords(i) + CHR$(13) + CHR$(10))
  44.     PRINT "Searching for: "; RandomWords(i),
  45.     IF place THEN PRINT MID$(word$, place, LEN(RandomWords(i))) ELSE PRINT "NOT FOUND!"
  46. PRINT USING "####.### seconds lookup"; TIMER - t#
  47.  
  48.  
  49.  
  50.  
  51.  
  52.     _KEYCLEAR
  53.     INPUT "Give me any word"; search$
  54.     search$ = LTRIM$(RTRIM$(search$))
  55.     PRINT "Searching for "; search$
  56.     IF search$ = "" THEN END
  57.     index = FindIndex(search$)
  58.     IF index THEN
  59.         PRINT WordList(index); " was found at position "; index; " in "; SearchTimes; "passes."
  60.     ELSE
  61.         PRINT "Word was not in list."
  62.         PRINT "Previous word was ==> "; WordList(LastIndex - 1)
  63.         PRINT "Next word was ======> "; WordList(LastIndex + 1)
  64.         PRINT "Search took"; SearchTimes; "passes."
  65.     END IF
  66.     PRINT
  67.  
  68.  
  69.  
  70. FUNCTION FindIndex (search$)
  71.     SHARED SearchTimes, LastIndex
  72.     SearchTimes = 0
  73.     min = 1 'lbound(wordlist)
  74.     max = 466544 'ubound(wordlist)
  75.  
  76.     DO UNTIL found
  77.         SearchTimes = SearchTimes + 1
  78.         gap = (max + min) \ 2
  79.         'IF gap = oldgap THEN gap = gap + 1
  80.         compare = _STRCMP(search$, WordList(gap))
  81.         IF compare > 0 THEN
  82.             oldmin = min
  83.             min = gap
  84.         ELSEIF compare < 0 THEN
  85.             oldmax = max
  86.             max = gap
  87.         ELSE
  88.             FindIndex = gap
  89.             found = -1
  90.             EXIT FUNCTION
  91.         END IF
  92.         oldgap = gap
  93.         IF max - min < 1 THEN LastIndex = gap: found = -1 'it's not in the list
  94.         ' PRINT min, max, search$, WordList(gap), compare
  95.         ' SLEEP
  96.     LOOP
  97.  
  98. SUB Sort (Array() AS STRING)
  99.     'The dice sorting routine, optimized to use _MEM and a comb sort algorithm.
  100.     'It's more than fast enough for our needs here I th ink.  ;)
  101.     gap = UBOUND(array)
  102.     DO
  103.         gap = 10 * gap \ 13
  104.         IF gap < 1 THEN gap = 1
  105.         i = 0
  106.         swapped = 0
  107.         DO
  108.             IF _STRCMP(Array(i), Array(i + gap)) > 0 THEN
  109.                 SWAP Array(i), Array(i + gap)
  110.                 swapped = -1
  111.             END IF
  112.             i = i + 1
  113.         LOOP UNTIL i + gap > UBOUND(Array)
  114.     LOOP UNTIL swapped = 0 AND gap = 1
  115.  

First issue:  The data in the wordlist isn't fully alphabetical...  O_o!!   It needed a quick sort so we can search it properly.  Just look at the first few lines; it should've been obvious, but I overlooked it:

Quote
2
1080
&c
10-point
10th
11-point
12-point
16-point
18-point
1st

Since when does "2" come before "1"??

Sorting routine added, and bug was squashed.



Glitch 2:

OPEN "466544 Word List.txt" FOR BINARY AS #1
max = 370099 'ubound(wordlist)

There's a small difference in those two numbers...  (I'd originally was using a smaller file, before I found this one on the web and grabbed it for the larger data set.)

There's no way we could find a ton of words in the list, while only checking about 3/4 of them...

Glitch fixed.



Glitch #3:

        ELSE
            FindIndex = gap
            found = -1
        END IF
        IF max - min <= 1 THEN LastIndex = gap: found = -1 'it's not in the list

We're missing one important piece of program flow here: 

        ELSE
            FindIndex = gap
            found = -1
            EXIT FUNCTION
        END IF
        IF max - min < 1 THEN LastIndex = gap: found = -1 'it's not in the list

We'd find the proper result, and then if the gap was small enough between the last numbers, we'd lie and claim it wasn't actually in the list...

Glitch fixed.



All should be working as advertised now, in the code above.  ;)

Sidenote:  Your implementation of Pete's method is glitched.  "dog" would be found if "dogfood" was included in the list, while "dog" actually wasn't.

I was going to do a fix for the issue, but you'd need to take a nap while it was running:

Code: QB64: [Select]
  1. FOR i = 1 TO 466545 ' adding brackets to words to stop a false match, such as "dog" being found in "dogfood"
  2.     word$ = word$ + "{" + WordList(i) + "}" 'build up the search list
  3.     IF i < 50001 THEN RandomWords(i) = "{" + RandomWords(i) + "}"

I tried to rebuild the already loaded and sorted list with brackets delimiting the words, but I didn't have the patience to sit and listen to my PC try to sound like an airplane with all the fans going full throttle for who knows how long....
Title: Re: Binary Search Method
Post by: luke on January 25, 2019, 05:42:29 am
And just because we can, here's a recursive binary sort:
Code: QB64: [Select]
  1. DEFLNG A-Z
  2.  
  3. OPEN "words" FOR INPUT AS #1
  4. DIM SHARED WordList(99170) AS STRING
  5.     LINE INPUT #1, WordList(count)
  6.     count = count + 1
  7.  
  8.     INPUT "> ", search$
  9.     IF search$ = "" THEN END
  10.     index = find(search$, 0, UBOUND(wordlist))
  11.     IF index >= 0 THEN
  12.         PRINT "Found "; WordList(index); " at"; index
  13.     ELSE
  14.         PRINT "Not found"
  15.     END IF
  16.  
  17. FUNCTION find (word$, start, finish)
  18.     size = finish - start + 1
  19.     mid = size \ 2
  20.     cmp = _STRCMP(WordList(start + mid), word$)
  21.     SELECT CASE cmp
  22.         CASE 0
  23.             find = start + mid
  24.         CASE IS > 0
  25.             IF size = 1 THEN find = -1 ELSE find = find(word$, start, start + mid - 1)
  26.         CASE IS < 0
  27.             IF size = 1 THEN find = -1 ELSE find = find(word$, start + mid + 1, finish)
  28.     END SELECT

(My word list had all the words beginning with a capital letter listed first, so I'm using _STRCMP)
Title: Re: Binary Search Method
Post by: SMcNeill on January 25, 2019, 09:18:49 am
Use it or not; there it is.

I’d still love to see a similar speed test for your method.  I’m curious how big a difference it could be.

It's open source. Have a ball. (Might wanna optimize it first.)

So, since Stx wasn't willing to do a speed comparison, I did:

Code: QB64: [Select]
  1. DIM SHARED HashTableSize AS LONG
  2. HashTableSize = 300007 ' Best to use a big prime number. Bigger examples are 611953 and 1014729.
  3.  
  4. DIM SHARED LB AS STRING ' Make sure that bcracketing sequences do not appear in the data source, otherwise use (a) special character(s).
  5. LB = "{"
  6. RB = "}"
  7.  
  8. DIM SHARED EnglishDictionary(HashTableSize) AS STRING ' Hash table size does not need to equal the size of the source dictionary itself.
  9.  
  10. OPEN "466544 Word List.txt" FOR BINARY AS #1
  11.  
  12. DIM SHARED WordList(466545) AS STRING
  13. PRINT "Loading library"
  14.     count = count + 1
  15.     LINE INPUT #1, WordList(count)
  16.  
  17. Sort WordList()
  18.  
  19. i = 0
  20. FOR i = 1 TO 466545
  21.     b$ = WordList(i) 'the word to store
  22.     c$ = LTRIM$(RTRIM$(STR$(i))) 'to store the index
  23.     d = HashFunc(b$) ' Calculate the hash value (array address) of the word on hand.
  24.     EnglishDictionary(d) = EnglishDictionary(d) + LB + b$ + RB + LB + c$ + RB
  25. PRINT "Done creating Hash Table."
  26.  
  27. ' Done developing fast lookup tool. Now time for an application.
  28.  
  29. PRINT "Looking up"
  30. DIM RandomWords(50000) AS STRING
  31. FOR i = 1 TO 50000
  32.     c = INT(RND * 466544) + 1
  33.     RandomWords(i) = WordList(c)
  34.  
  35. t# = TIMER
  36. FOR i = 1 TO 50000
  37.     'PRINT "Searching for: "; RandomWords(i),
  38.     l$ = Lookup$(RandomWords(i))
  39.     IF l$ <> "" THEN
  40.         l = INSTR(l$, " ")
  41.         word$ = LEFT$(l$, l - 1)
  42.         index = VAL(MID$(l$, l + 1))
  43.         'PRINT WordList(index) 'to show that we got the index back successfully
  44.         'ELSE
  45.         'PRINT "NOT FOUND!"
  46.     END IF
  47. PRINT USING "###.### seconds lookup using Hash Table"; TIMER - t#
  48.  
  49. t# = TIMER
  50. FOR i = 1 TO 50000
  51.     index = FindIndex(RandomWords(i))
  52.     'PRINT "Searching for: "; RandomWords(i),
  53.     'IF index THEN 'to show that we got the index back successfully
  54.     '    PRINT WordList(index)
  55.     'ELSE
  56.     '    PRINT "NOT FOUND!"
  57.     'END IF
  58. PRINT USING "###.### seconds lookup using Binary Search"; TIMER - t#
  59.  
  60. PRINT "And just to compare results -- the first 15 words:"
  61. FOR i = 1 TO 10
  62.     'PRINT "Searching for: "; RandomWords(i),
  63.     l$ = Lookup$(RandomWords(i))
  64.     IF l$ <> "" THEN
  65.         l = INSTR(l$, " ")
  66.         word$ = LEFT$(l$, l - 1)
  67.         index = VAL(MID$(l$, l + 1))
  68.         PRINT WordList(index), 'to show that we got the index back successfully
  69.     ELSE
  70.         PRINT "NOT FOUND!",
  71.     END IF
  72.     index = FindIndex(RandomWords(i))
  73.     'PRINT "Searching for: "; RandomWords(i),
  74.     IF index THEN 'to show that we got the index back successfully
  75.         PRINT WordList(index)
  76.     ELSE
  77.         PRINT "NOT FOUND!"
  78.     END IF
  79.  
  80.  
  81.  
  82.  
  83. FUNCTION Lookup$ (a AS STRING)
  84.     r$ = ""
  85.     b$ = EnglishDictionary(HashFunc(a))
  86.     c$ = ""
  87.     d$ = ""
  88.     IF b$ <> "" THEN
  89.         DO WHILE c$ <> a
  90.             c$ = ReturnBetween(b$, LB, RB)
  91.             IF c$ = "" THEN EXIT DO
  92.             b$ = RIGHT$(b$, LEN(b$) - LEN(LB + c$ + RB))
  93.             d$ = ReturnBetween(b$, LB, RB)
  94.         LOOP
  95.     END IF
  96.     r$ = a + "  " + d$
  97.     Lookup$ = r$
  98.  
  99. FUNCTION ReturnBetween$ (a AS STRING, b AS STRING, c AS STRING) ' input string, left bracket, right bracket
  100.     i = INSTR(a, b)
  101.     j = INSTR(a, c)
  102.     f = LEN(c)
  103.     ReturnBetween$ = MID$(a, i + f, j - (i + f))
  104.  
  105. FUNCTION HashFunc (a AS STRING) ' input string
  106.     DIM sum AS DOUBLE
  107.     sum = HashTableSize
  108.     FOR k = 1 TO LEN(a)
  109.         sum = sum + k * COS(ASC(MID$(a, k, 1))) ' Without the linear factor of k, permutations have same hash values.
  110.     NEXT
  111.     sum = ABS(VAL(ReplaceSubString$(STR$(sum), ".", "")))
  112.     sum = sum MOD HashTableSize
  113.     HashFunc = sum
  114.  
  115. FUNCTION ReplaceSubString$ (a AS STRING, b AS STRING, c AS STRING)
  116.     j = INSTR(a, b)
  117.     IF j > 0 THEN
  118.         r$ = LEFT$(a, j - 1) + c + ReplaceSubString$(RIGHT$(a, LEN(a) - j + 1 - LEN(b)), b, c)
  119.     ELSE
  120.         r$ = a
  121.     END IF
  122.     ReplaceSubString$ = r$
  123.  
  124.  
  125. SUB Sort (Array() AS STRING)
  126.     'The dice sorting routine, optimized to use _MEM and a comb sort algorithm.
  127.     'It's more than fast enough for our needs here I th ink.  ;)
  128.     gap = UBOUND(array)
  129.     DO
  130.         gap = 10 * gap \ 13
  131.         IF gap < 1 THEN gap = 1
  132.         i = 0
  133.         swapped = 0
  134.         DO
  135.             IF _STRCMP(Array(i), Array(i + gap)) > 0 THEN
  136.                 SWAP Array(i), Array(i + gap)
  137.                 swapped = -1
  138.             END IF
  139.             i = i + 1
  140.         LOOP UNTIL i + gap > UBOUND(Array)
  141.     LOOP UNTIL swapped = 0 AND gap = 1
  142.  
  143. FUNCTION FindIndex (search$)
  144.     SHARED SearchTimes, LastIndex
  145.     SearchTimes = 0
  146.     min = 1 'lbound(wordlist)
  147.     max = 466544 'ubound(wordlist)
  148.  
  149.     DO UNTIL found
  150.         SearchTimes = SearchTimes + 1
  151.         gap = (max + min) \ 2
  152.         compare = _STRCMP(search$, WordList(gap))
  153.         IF compare > 0 THEN
  154.             min = gap
  155.         ELSEIF compare < 0 THEN
  156.             max = gap
  157.         ELSE
  158.             FindIndex = gap
  159.             found = -1
  160.             EXIT FUNCTION
  161.         END IF
  162.         IF max - min < 1 THEN LastIndex = gap: found = -1 'it's not in the list
  163.         ' PRINT min, max, search$, WordList(gap), compare
  164.         ' SLEEP
  165.     LOOP
  166.  
  167.  

In both search instances, we use the same word list, the same 50,000 word search list, and we return both the word we're searching for and its index position in the list.

Now, I imagine that Stx will come along and tell me I'm doing something wrong with the routine he provided, but the screenshot below is the results I get.  If there's something wrong with things, I'm more than happy to learn from whatever the issue is:
Title: Re: Binary Search Method
Post by: STxAxTIC on January 25, 2019, 09:34:54 am
As predicted, it's not apples-to-apples.

Gotta replace my hash function which uses a cosine, and then a completely separate recursive string function just to remove the decimal point. The ReplaceSubString function should be gone from the hash function - use Lukes suggestion instead.

Also your wordlist of 400k+ entries exceeded the value of the hash table size of 300007. You wired it for collisions in other words. Not sure if you noticed or if that's an accidental straw man.

This isn't worth beans until you get an honest comparison.

EDIT

LOL and I just noticed you're using the hash table wrong. Why on earth are you storing the index (as a collision) next to the word? This is a speed test - trim that fat!

... And once we see where this test finally goes, let's push the record limit to the maximum and see how each one does with like, a billion words.
Title: Re: Binary Search Method
Post by: Pete on January 25, 2019, 10:01:20 am
Bill, you made my point about the problems with hashing. With my methods and Steve's method, you don't need a complicated pre-plan, which may need to be overhauled or adjusted someday. Just alphabetize the list, which a one-time never change it algorithm.

I do love the hashing method and it is a damn clever fast and efficient way to set up a list to be searched and indexed. I suppose the merits in any of these methods depends on the use. For instance, indexing doesn't mean squat in a spell check routine, so I need that feature like a fish needs a bicycle. That's one reason why even my quick and dirty QB64 instr() method is appealing to me. So while you keep searching for the Holly Grail of hash, Steve and I just grab a couple of Dixie cups and drink up.

In any event, good luck with your hash refinements and thank you for starting this awesome topic for discussion, originally in the, "Dictionizer" thread, and to Steve for continuing it here. Guys, we need more critical thinking threads like these around here.

Pete
Title: Re: Binary Search Method
Post by: SMcNeill on January 25, 2019, 10:02:36 am
As predicted, it's not apples-to-apples.

Gotta replace my hash function which uses a cosine, and then a completely separate recursive string function just to remove the decimal point. The ReplaceSubString function should be gone from the hash function - use Lukes suggestion instead.

Also your wordlist of 400k+ entries exceeded the value of the hash table size of 300007. You wired it for collisions in other words. Not sure if you noticed or if that's an accidental straw man.

This isn't worth beans until you get an honest comparison.

I told you you were comparing apples to peanut butter...

The Binary Search is a "search and done" method for data retrieval.  The Hash Table is an "instant jump to a spot in memory and then parse a much smaller data set" method for data retrieval.  As long as you're going to have collisions and need to search for those, you have to take that search time into consideration.

Using your suggestions of "Luke's method" + "Table Larger Than List", we end up with the following code:

Code: QB64: [Select]
  1. DIM SHARED HashTableSize AS LONG
  2. HashTableSize = 611953 ' Best to use a big prime number. Bigger examples are 611953 and 1014729.
  3.  
  4. DIM SHARED LB AS STRING ' Make sure that bcracketing sequences do not appear in the data source, otherwise use (a) special character(s).
  5. LB = "{"
  6. RB = "}"
  7.  
  8. DIM SHARED EnglishDictionary(HashTableSize) AS STRING ' Hash table size does not need to equal the size of the source dictionary itself.
  9.  
  10. OPEN "466544 Word List.txt" FOR BINARY AS #1
  11.  
  12. DIM SHARED WordList(466545) AS STRING
  13. PRINT "Loading library"
  14.     count = count + 1
  15.     LINE INPUT #1, WordList(count)
  16.  
  17. Sort WordList()
  18.  
  19. i = 0
  20. FOR i = 1 TO 466545
  21.     b$ = WordList(i) 'the word to store
  22.     c$ = LTRIM$(RTRIM$(STR$(i))) 'to store the index
  23.     d = HashFunc(b$) ' Calculate the hash value (array address) of the word on hand.
  24.     EnglishDictionary(d) = EnglishDictionary(d) + LB + b$ + RB + LB + c$ + RB
  25. PRINT "Done creating Hash Table."
  26.  
  27. ' Done developing fast lookup tool. Now time for an application.
  28.  
  29. PRINT "Looking up"
  30. DIM RandomWords(50000) AS STRING
  31. FOR i = 1 TO 50000
  32.     c = INT(RND * 466544) + 1
  33.     RandomWords(i) = WordList(c)
  34.  
  35. t# = TIMER
  36. FOR i = 1 TO 50000
  37.     'PRINT "Searching for: "; RandomWords(i),
  38.     l$ = Lookup$(RandomWords(i))
  39.     'IF l$ <> "" THEN
  40.     'l = INSTR(l$, " ")
  41.     'word$ = LEFT$(l$, l - 1)
  42.     'index = VAL(MID$(l$, l + 1))
  43.     'PRINT WordList(index) 'to show that we got the index back successfully
  44.     'ELSE
  45.     'PRINT "NOT FOUND!"
  46.     'END IF
  47. PRINT USING "###.### seconds lookup using Hash Table"; TIMER - t#
  48.  
  49. t# = TIMER
  50. FOR i = 1 TO 50000
  51.     index = FindIndex(RandomWords(i))
  52.     'PRINT "Searching for: "; RandomWords(i),
  53.     'IF index THEN 'to show that we got the index back successfully
  54.     '    PRINT WordList(index)
  55.     'ELSE
  56.     '    PRINT "NOT FOUND!"
  57.     'END IF
  58. PRINT USING "###.### seconds lookup using Binary Search"; TIMER - t#
  59.  
  60. PRINT "And just to compare results -- the first 15 words:"
  61. FOR i = 1 TO 10
  62.     'PRINT "Searching for: "; RandomWords(i),
  63.     l$ = Lookup$(RandomWords(i))
  64.     IF l$ <> "" THEN
  65.         l = INSTR(l$, " ")
  66.         word$ = LEFT$(l$, l - 1)
  67.         index = VAL(MID$(l$, l + 1))
  68.         PRINT WordList(index), 'to show that we got the index back successfully
  69.     ELSE
  70.         PRINT "NOT FOUND!",
  71.     END IF
  72.     index = FindIndex(RandomWords(i))
  73.     'PRINT "Searching for: "; RandomWords(i),
  74.     IF index THEN 'to show that we got the index back successfully
  75.         PRINT WordList(index)
  76.     ELSE
  77.         PRINT "NOT FOUND!"
  78.     END IF
  79.  
  80.  
  81.  
  82.  
  83. FUNCTION Lookup$ (a AS STRING)
  84.     r$ = ""
  85.     b$ = EnglishDictionary(HashFunc(a))
  86.     c$ = ""
  87.     d$ = ""
  88.     IF b$ <> "" THEN
  89.         DO WHILE c$ <> a
  90.             c$ = ReturnBetween(b$, LB, RB)
  91.             IF c$ = "" THEN EXIT DO
  92.             b$ = RIGHT$(b$, LEN(b$) - LEN(LB + c$ + RB))
  93.             d$ = ReturnBetween(b$, LB, RB)
  94.         LOOP
  95.     END IF
  96.     r$ = a + "  " + d$
  97.     Lookup$ = r$
  98.  
  99. FUNCTION ReturnBetween$ (a AS STRING, b AS STRING, c AS STRING) ' input string, left bracket, right bracket
  100.     i = INSTR(a, b)
  101.     j = INSTR(a, c)
  102.     f = LEN(c)
  103.     ReturnBetween$ = MID$(a, i + f, j - (i + f))
  104.  
  105. FUNCTION HashFunc (s$)
  106.     DIM hash~&, i
  107.     hash~& = 5381
  108.     FOR i = 1 TO LEN(s$)
  109.         hash~& = ((hash~& * 33) XOR ASC(s$, i)) MOD 611953
  110.     NEXT i
  111.     HashFunc = hash~&
  112.  
  113.  
  114. FUNCTION ReplaceSubString$ (a AS STRING, b AS STRING, c AS STRING)
  115.     j = INSTR(a, b)
  116.     IF j > 0 THEN
  117.         r$ = LEFT$(a, j - 1) + c + ReplaceSubString$(RIGHT$(a, LEN(a) - j + 1 - LEN(b)), b, c)
  118.     ELSE
  119.         r$ = a
  120.     END IF
  121.     ReplaceSubString$ = r$
  122.  
  123.  
  124. SUB Sort (Array() AS STRING)
  125.     'The dice sorting routine, optimized to use _MEM and a comb sort algorithm.
  126.     'It's more than fast enough for our needs here I th ink.  ;)
  127.     gap = UBOUND(array)
  128.     DO
  129.         gap = 10 * gap \ 13
  130.         IF gap < 1 THEN gap = 1
  131.         i = 0
  132.         swapped = 0
  133.         DO
  134.             IF _STRCMP(Array(i), Array(i + gap)) > 0 THEN
  135.                 SWAP Array(i), Array(i + gap)
  136.                 swapped = -1
  137.             END IF
  138.             i = i + 1
  139.         LOOP UNTIL i + gap > UBOUND(Array)
  140.     LOOP UNTIL swapped = 0 AND gap = 1
  141.  
  142. FUNCTION FindIndex (search$)
  143.     SHARED SearchTimes, LastIndex
  144.     SearchTimes = 0
  145.     min = 1 'lbound(wordlist)
  146.     max = 466544 'ubound(wordlist)
  147.  
  148.     DO UNTIL found
  149.         SearchTimes = SearchTimes + 1
  150.         gap = (max + min) \ 2
  151.         compare = _STRCMP(search$, WordList(gap))
  152.         IF compare > 0 THEN
  153.             min = gap
  154.         ELSEIF compare < 0 THEN
  155.             max = gap
  156.         ELSE
  157.             FindIndex = gap
  158.             found = -1
  159.             EXIT FUNCTION
  160.         END IF
  161.         IF max - min < 1 THEN LastIndex = gap: found = -1 'it's not in the list
  162.         ' PRINT min, max, search$, WordList(gap), compare
  163.         ' SLEEP
  164.     LOOP
  165.  
  166.  

Faster than previously by about half, but still about twice as slow as the  binary search method...

Regardless; I think the results speak to show that when your data is already sorted, doing a binary search of it is rather quite efficient. 

Anyway, I think I've did as you asked:

Quote
I think the burden's on you to do the speed test m8 (for a large data set with lots of performance demand).

If it's *wrong* at this point, I think the burden's on you to show us where and how.
Title: Re: Binary Search Method
Post by: STxAxTIC on January 25, 2019, 10:11:06 am
Nah, I don't need to prove shit to you guys again on this. Fun talk though. Someone who takes the time to sift through this convo, and the way Steve split it among two threads, will get a real chuckle.

If you really want a *win* on this Steve, adjust it for a dataset size typical of a respectable database. A few billion if you can. There's the bait. You might even be right. Try a billion or so.

EDIT:

shit car's been running. lemme fix this up later
Title: Re: Binary Search Method
Post by: Pete on January 25, 2019, 10:29:42 am
Check your refrigerator. If it's running too, the chances are it's having an affair with your car. You'll probably be back soon, because history proves frigid relationships don't last too long.

Seriously Bill, billions of data entries and a hash algorithm to accommodate that project?  What, thousands of collisions that need to be sorted? I would think at that level, you would at least need to combine our methods to get adequate results.

Pete
Title: Re: Binary Search Method
Post by: SMcNeill on January 25, 2019, 11:15:27 am
...adjust it for a dataset size typical of a respectable database. A few billion if you can. There's the bait. You might even be right. Try a billion or so.

Here's you a list with 10 million "words" in it, but it'll easily scale to whatever size you want (as long as you have the memory limits to run it):

Code: QB64: [Select]
  1. DEFLNG A-Z
  2. DIM SHARED HashTableSize AS LONG
  3. HashTableSize = 10000019 ' Best to use a big prime number. Bigger examples are 611953 and 1014729.
  4.  
  5. DIM SHARED LB AS STRING ' Make sure that bcracketing sequences do not appear in the data source, otherwise use (a) special character(s).
  6. LB = "{"
  7. RB = "}"
  8.  
  9. PRINT "Reserving memory for Hashtable..."
  10. DIM SHARED EnglishDictionary(HashTableSize) AS STRING ' Hash table size does not need to equal the size of the source dictionary itself.
  11.  
  12.  
  13. CONST limit = 10000000
  14. DIM SHARED WordList(limit) AS STRING
  15.  
  16. PRINT "Generating List..."
  17. FOR i = 1 TO limit
  18.     WordList(i) = STR$(i) 'generate a list of "words"
  19.  
  20. PRINT "Sorting list to alphabeticlize it..."
  21. Sort WordList() 'to put it in alphabetic order and not numeric
  22.  
  23. PRINT "Creating Hash Table..."
  24. FOR i = 1 TO limit
  25.     b$ = WordList(i) 'the word to store
  26.     d = HashFunc(b$) ' Calculate the hash value (array address) of the word on hand.
  27.     EnglishDictionary(d) = EnglishDictionary(d) + LB + b$ + RB
  28. PRINT "Done creating Hash Table."
  29.  
  30. ' Done developing fast lookup tool. Now time for an application.
  31.  
  32. PRINT "Generating search list..."
  33. DIM RandomWords(limit) AS STRING
  34. RandomWords() = WordList() 'lets just look up all the damn words
  35.  
  36.  
  37. PRINT "Searching using Hash Table"
  38. t# = TIMER
  39. FOR i = 1 TO limit
  40.     l$ = Lookup$(RandomWords(i))
  41. PRINT USING "###.### seconds lookup using Hash Table"; TIMER - t#
  42.  
  43. PRINT "Searching using Binary Search"
  44. t# = TIMER
  45. FOR i = 1 TO limit
  46.     index = FindIndex(RandomWords(i))
  47.     'PRINT i, index, "?"; RandomWords(index)
  48.     'SLEEP
  49. PRINT USING "###.### seconds lookup using Binary Search"; TIMER - t#
  50.  
  51.  
  52. FUNCTION Lookup$ (a AS STRING)
  53.     r$ = ""
  54.     b$ = EnglishDictionary(HashFunc(a))
  55.     c$ = ""
  56.     d$ = ""
  57.     IF b$ <> "" THEN
  58.         DO WHILE c$ <> a
  59.             c$ = ReturnBetween(b$, LB, RB)
  60.             IF c$ = "" THEN EXIT DO
  61.             b$ = RIGHT$(b$, LEN(b$) - LEN(LB + c$ + RB))
  62.             'd$ = ReturnBetween(b$, LB, RB)
  63.         LOOP
  64.     END IF
  65.     r$ = a '+ "  " + d$
  66.     Lookup$ = r$
  67.  
  68. FUNCTION ReturnBetween$ (a AS STRING, b AS STRING, c AS STRING) ' input string, left bracket, right bracket
  69.     i = INSTR(a, b)
  70.     j = INSTR(a, c)
  71.     f = LEN(c)
  72.     ReturnBetween$ = MID$(a, i + f, j - (i + f))
  73.  
  74. FUNCTION HashFunc (s$)
  75.     DIM hash~&, i
  76.     hash~& = 5381
  77.     FOR i = 1 TO LEN(s$)
  78.         hash~& = ((hash~& * 33) XOR ASC(s$, i)) MOD HashTableSize
  79.     NEXT i
  80.     HashFunc = hash~&
  81.  
  82.  
  83. FUNCTION ReplaceSubString$ (a AS STRING, b AS STRING, c AS STRING)
  84.     j = INSTR(a, b)
  85.     IF j > 0 THEN
  86.         r$ = LEFT$(a, j - 1) + c + ReplaceSubString$(RIGHT$(a, LEN(a) - j + 1 - LEN(b)), b, c)
  87.     ELSE
  88.         r$ = a
  89.     END IF
  90.     ReplaceSubString$ = r$
  91.  
  92.  
  93. SUB Sort (Array() AS STRING)
  94.     'The dice sorting routine, optimized to use _MEM and a comb sort algorithm.
  95.     'It's more than fast enough for our needs here I th ink.  ;)
  96.     gap = UBOUND(array)
  97.     DO
  98.         gap = 10 * gap \ 13
  99.         IF gap < 1 THEN gap = 1
  100.         i = 0
  101.         swapped = 0
  102.         DO
  103.             IF _STRCMP(Array(i), Array(i + gap)) > 0 THEN
  104.                 SWAP Array(i), Array(i + gap)
  105.                 swapped = -1
  106.             END IF
  107.             i = i + 1
  108.         LOOP UNTIL i + gap > UBOUND(Array)
  109.     LOOP UNTIL swapped = 0 AND gap = 1
  110.  
  111. FUNCTION FindIndex (search$)
  112.     SHARED SearchTimes, LastIndex
  113.     SearchTimes = 0
  114.     min = 1 'lbound(wordlist)
  115.     max = limit + 1
  116.  
  117.     DO UNTIL found
  118.         SearchTimes = SearchTimes + 1
  119.         gap = (max + min) \ 2
  120.         compare = _STRCMP(search$, WordList(gap))
  121.         IF compare > 0 THEN
  122.             min = gap
  123.         ELSEIF compare < 0 THEN
  124.             max = gap
  125.         ELSE
  126.             FindIndex = gap
  127.             found = -1
  128.             EXIT FUNCTION
  129.         END IF
  130.         IF max - min < 1 THEN LastIndex = gap: found = -1 'it's not in the list
  131.         'PRINT min, max, search$, WordList(gap), compare
  132.         'SLEEP
  133.     LOOP
  134.  

One word of caution though:  This won't run in QB64x32...

You wanted a hash table size larger than the data set ("Also your wordlist of 400k+ entries exceeded the value of the hash table size of 300007. You wired it for collisions in other words."), and simply reserving space for it in memory takes up about 2.6GB of RAM, as you can see just from the snippet below:

Code: QB64: [Select]
  1. DEFLNG A-Z
  2. DIM SHARED HashTableSize AS LONG
  3. HashTableSize = 10000019 ' Best to use a big prime number. Bigger examples are 611953 and 1014729.
  4.  
  5. DIM SHARED LB AS STRING ' Make sure that bcracketing sequences do not appear in the data source, otherwise use (a) special character(s).
  6. LB = "{"
  7. RB = "}"
  8.  
  9. PRINT "Reserving memory for Hashtable..."
  10. DIM SHARED EnglishDictionary(HashTableSize) AS STRING ' Hash table size does not need to equal the size of the source dictionary itself.

Once the data is all filled and in memory, it requires 3,418.6MB to run without  crashing.

(And, for what it's worth, the binary search still beats the hash table for speed, on my PC, with one taking 6 seconds, the other taking 4.5.)
Title: Re: Binary Search Method
Post by: _vince on January 25, 2019, 11:56:03 am
Steve, the big-O notation is designed to specifically not compare apples to oranges. Dicscrete calculus will be a lesson for another time... I just cant right now, my facepalm hurts.

Dicscrete calculus, Bill? I think you'd better chose a search method and get that spell checker up and working for craps sake.

If you mean discrete calculus, is that one of those math courses offered at night for active singles? If so, what is the probably of completing the course without contracting a nasty case of standard deviations?

Pete :D

Either way, the dic will secrete
Title: Re: Binary Search Method
Post by: Pete on January 25, 2019, 12:01:38 pm
They say comedy is contagious, but I'm not sure anyone wants to catch it that way.

Pete :D
Title: Re: Binary Search Method
Post by: Ed Davis on January 25, 2019, 12:04:24 pm
Well, I thought I would give this a try.

I took the original code, and tried to make it "apples to apples" as much as possible.

Note that for the hash function proper, no need to do a mod on each iteration.  One at the end is enough.  Also, I redid the hash lookup, as it was doing lots of extra work.

On my machine, hashing wins: .109 vs .219.  About twice as fast.

Code: QB64: [Select]
  1. DIM SHARED HashTableSize AS LONG
  2. HashTableSize = 500009' Best to use a big prime number. Bigger examples are 611953 and 1014729.
  3.  
  4. DIM SHARED LB AS STRING ' Make sure that bcracketing sequences do not appear in the data source, otherwise use (a) special character(s).
  5. LB = "{"
  6. RB = "}"
  7.  
  8. DIM SHARED EnglishDictionary(HashTableSize) AS STRING ' Hash table size does not need to equal the size of the source dictionary itself.
  9.  
  10. OPEN "466544 Word List.txt" FOR BINARY AS #1
  11.  
  12. DIM SHARED WordList(466545) AS STRING
  13. PRINT "Loading library"
  14.     count = count + 1
  15.     LINE INPUT #1, WordList(count)
  16.  
  17. print "Sorting wordlist"
  18. Sort WordList()
  19.  
  20. print "Creating Hash Table"
  21. i = 0
  22. FOR i = 1 TO ubound(wordlist)
  23.     b$ = WordList(i) 'the word to store
  24.     d = HashFunc(b$) ' Calculate the hash value (array address) of the word on hand.
  25.     if d < 0 then
  26.         print "hash out of range"
  27.         end
  28.     end if
  29.     EnglishDictionary(d) = EnglishDictionary(d) + LB + b$ + RB
  30.  
  31. ' Done developing fast lookup tool. Now time for an application.
  32.  
  33. const ITER = 75000
  34.  
  35. PRINT "Creating "; ITER; " random word lookup table"
  36. DIM RandomWords(ITER) AS STRING
  37. FOR i = 1 TO ITER
  38.     c = INT(RND * 466544) + 1
  39.     RandomWords(i) = WordList(c)
  40.  
  41. Print "Search via Hashing...       ";
  42. t# = TIMER
  43. FOR i = 1 TO ITER
  44.     l$ = Lookup$(RandomWords(i))
  45.     if l$ <> RandomWords(i) then
  46.         print "HashTable error - searched: ", RandomWords(i), " found: ", l$
  47.         end
  48.     end if
  49. PRINT USING "###.### seconds lookup using Hash Table"; TIMER - t#
  50.  
  51. Print "search via Binary Search... ";
  52. t# = TIMER
  53. FOR i = 1 TO ITER
  54.     index = FindIndex(RandomWords(i))
  55.     if WordList(index) <> RandomWords(i) then
  56.         print "BinarySearch error - searched: ", RandomWords(i), " found: ", WordList(index)
  57.     end if
  58. PRINT USING "###.### seconds lookup using Binary Search"; TIMER - t#
  59.  
  60. PRINT "And just to compare results -- the first 10 words:"
  61. FOR i = 1 TO 10
  62.     'PRINT "Searching for: "; RandomWords(i),
  63.     l$ = Lookup$(RandomWords(i))
  64.     IF l$ <> "" THEN
  65.         l = INSTR(l$, " ")
  66.         word$ = LEFT$(l$, l - 1)
  67.         index = VAL(MID$(l$, l + 1))
  68.         PRINT WordList(index), 'to show that we got the index back successfully
  69.     ELSE
  70.         PRINT "NOT FOUND!",
  71.     END IF
  72.     index = FindIndex(RandomWords(i))
  73.     'PRINT "Searching for: "; RandomWords(i),
  74.     IF index THEN 'to show that we got the index back successfully
  75.         PRINT WordList(index)
  76.     ELSE
  77.         PRINT "NOT FOUND!"
  78.     END IF
  79.  
  80. FUNCTION Lookup$ (s AS STRING)
  81.     haystack$ = EnglishDictionary(HashFunc(s))
  82.     needle$ = LB + s + RB
  83.     p = instr(haystack$, needle$)
  84.     if p = 0 then
  85.         Lookup$ = ""
  86.     else
  87.         Lookup$ = mid$(haystack$, p + 1, LEN(needle$) - 2)
  88.     end if
  89.  
  90. FUNCTION HashFunc (s AS STRING) ' input string
  91.     DIM hash~&, i
  92.  
  93.     hash~& = 5381
  94.  
  95.     FOR i = 1 TO LEN(s$)
  96.         hash~& = ((hash~& * 33) XOR ASC(s$, i))
  97.     NEXT i
  98.  
  99.     HashFunc = hash~& mod HashTableSize
  100.  
  101. SUB Sort (Array() AS STRING)
  102.     'The dice sorting routine, optimized to use _MEM and a comb sort algorithm.
  103.     'It's more than fast enough for our needs here I think.  ;)
  104.     gap = UBOUND(array)
  105.     DO
  106.         gap = 10 * gap \ 13
  107.         IF gap < 1 THEN gap = 1
  108.         i = 0
  109.         swapped = 0
  110.         DO
  111.             IF _STRCMP(Array(i), Array(i + gap)) > 0 THEN
  112.                 SWAP Array(i), Array(i + gap)
  113.                 swapped = -1
  114.             END IF
  115.             i = i + 1
  116.         LOOP UNTIL i + gap > UBOUND(Array)
  117.     LOOP UNTIL swapped = 0 AND gap = 1
  118.  
  119. FUNCTION FindIndex (search$)
  120.     SHARED SearchTimes, LastIndex
  121.     SearchTimes = 0
  122.     min = lbound(wordlist)
  123.     max = ubound(wordlist)
  124.  
  125.     DO UNTIL found
  126.         SearchTimes = SearchTimes + 1
  127.         gap = (max + min) \ 2
  128.         compare = _STRCMP(search$, WordList(gap))
  129.         IF compare > 0 THEN
  130.             min = gap
  131.         ELSEIF compare < 0 THEN
  132.             max = gap
  133.         ELSE
  134.             FindIndex = gap
  135.             found = -1
  136.             EXIT FUNCTION
  137.         END IF
  138.         IF max - min < 1 THEN LastIndex = gap: found = -1 'it's not in the list
  139.         ' PRINT min, max, search$, WordList(gap), compare
  140.         ' SLEEP
  141.     LOOP
  142.  
  143.  

Many years ago (1990 to be precise), when I was needing something like this for a project I was doing in C, I ran extensive tests on different search functions - avltree, binarytree, binary search, hashing (I tested 135 different hash functions), custom lists tuned to the data, and a few more.

I reran some of those tests this morning.

In C at least, with my code (e.g., a custom binary search, instead of the built-in bsearch), hashing is about 2 times faster than binary searching.  So pretty much what I found for the code above.

Per the literature, it is even faster to use a power of 2 for the table size, rather than a prime for the table size, and then and off the appropriate bits instead of doing the mod at the end of the hash function.  In my tests with C, I did find this to be true, although it was just slightly faster, not strikingly so.  But of course, when you have potentially many millions of lookups, then every bit of speed counts.  Also note that the power of 2 table size assumes a high quality hash function.  Otherwise, it is best to stick with a prime number as the table size.

Anyway, cool topic!  Thanks to everyone who shared something on this!
Title: Re: Binary Search Method
Post by: SMcNeill on January 25, 2019, 12:23:35 pm
I’m getting perfectly equal time tests on my PC, with both taking 0.164 seconds to run.  Either way, both methods are generally fast enough to hold up and work in most situations.

The biggest advantage to the hash list is that you dont need sorted data, whereas a binary search has lower memory requirements.  My advice is to choose whichever you prefer; either will usually work in most cases.
Title: Re: Binary Search Method
Post by: SMcNeill on January 25, 2019, 01:32:58 pm
Taking Ed's improvements, I played around and tweaked the hash routine even further, making it a  much faster program:

Code: QB64: [Select]
  1. DIM SHARED HashTableSize AS LONG
  2. HashTableSize = 500009 ' Best to use a big prime number. Bigger examples are 611953 and 1014729.
  3.  
  4. DIM SHARED EnglishDictionary(HashTableSize) AS STRING ' Hash table size does not need to equal the size of the source dictionary itself.
  5.  
  6. OPEN "466544 Word List.txt" FOR BINARY AS #1
  7.  
  8. DIM SHARED WordList(466545) AS STRING
  9. PRINT "Loading library"
  10.     count = count + 1
  11.     LINE INPUT #1, WordList(count)
  12.  
  13. PRINT "Creating Hash Table"
  14. i = 0
  15. FOR i = 1 TO UBOUND(wordlist)
  16.     b$ = WordList(i) 'the word to store
  17.     d = HashFunc(b$) ' Calculate the hash value (array address) of the word on hand.
  18.     IF d < 0 THEN
  19.         PRINT "hash out of range"
  20.         END
  21.     END IF
  22.     EnglishDictionary(d) = EnglishDictionary(d) + CHR$(LEN(b$)) + b$
  23.  
  24. ' Done developing fast lookup tool. Now time for an application.
  25.  
  26.  
  27.  
  28. CONST ITER = 75000
  29.  
  30. PRINT "Creating "; ITER; " random word lookup table"
  31. DIM RandomWords(ITER) AS STRING
  32. FOR i = 1 TO ITER
  33.     c = INT(RND * 466544) + 1
  34.     RandomWords(i) = WordList(c)
  35.  
  36. PRINT "Search via Hashing...       ";
  37. t# = TIMER
  38. FOR i = 1 TO ITER
  39.     l$ = Lookup$(RandomWords(i))
  40.     IF l$ <> RandomWords(i) THEN
  41.         PRINT "HashTable error - searched: ", RandomWords(i), " found: ", l$
  42.         END
  43.     END IF
  44. PRINT USING "###.### seconds lookup using Hash Table"; TIMER - t#
  45.  
  46. FUNCTION Lookup$ (s AS STRING)
  47.     haystack$ = EnglishDictionary(HashFunc(s))
  48.     size = 1: l = 1
  49.     DO UNTIL needle$ = s OR size = 0
  50.         size = ASC(haystack$, l)
  51.         needle$ = MID$(haystack$, l + 1, size)
  52.         l = l + size + 1
  53.     LOOP
  54.     IF size > 0 THEN Lookup$ = needle$
  55.  
  56. FUNCTION HashFunc (s AS STRING) ' input string
  57.     DIM hash~&, i
  58.     hash~& = 5381
  59.     FOR i = 1 TO LEN(s$)
  60.         hash~& = ((hash~& * 33) XOR ASC(s$, i))
  61.     NEXT i
  62.     HashFunc = hash~& MOD HashTableSize

What's the main difference here?

EnglishDictionary(d) = EnglishDictionary(d) + CHR$(LEN(b$)) + b$

We're not storing our data separated by delimiters.  Instead, we're manually recording the size of each entry and using it to directly retrieve our information.  (Side note: This is a simple implementation of those dreaded "linked lists" which STx is so obsessive over.  It's hard to believe he didn't implement one from the start, since he's the resident guru on the subject for us!)

By going this route, we don't need to rely on INSTR to search for the various entries for us, giving us two distinct advantages:

1) It's much faster to "jump" from word to word, than it is to search byte by byte for a match.

2) It's less likely to generate any false positives, as we're retrieving the exact string from memory and comparing.  _INSTR("dogfood", "dog") is a match -- there's a "dog" in "dogfood", but "dog" <> "dogfood" if we compare directly.

Like most code, when it comes to speed, the devil's in the details.
Title: Re: Binary Search Method
Post by: Pete on January 25, 2019, 01:48:36 pm
I've already pointed out INSTR requires use of a delimiter like a comma.

dog - dogfood is easily solved that way...

wordlist$ = ",bill,cat,dog,dogfood,pete,steve,"
a = INSTR(wordlist$,",dog,")

dogfood would not be returned, just dog. Besides... dog food is two words. :P

Hey, did you try my INSTR search and frequency sample? It's dog gone fast!
https://www.qb64.org/forum/index.php?topic=1001.msg102017#msg102017

Pete
Title: Re: Binary Search Method
Post by: SMcNeill on January 25, 2019, 02:04:03 pm
I've already pointed out INSTR requires use of a delimiter like a comma.

dog - dogfood is easily solved that way...

wordlist$ = ",bill,cat,dog,dogfood,pete,steve,"
a = INSTR(wordlist$,",dog,")

dogfood would not be returned, just dog. Besides... dog food is two words. :P

Hey, did you try my INSTR search and frequency sample? It's dog gone fast!
https://www.qb64.org/forum/index.php?topic=1001.msg102017#msg102017

Pete

That's the point I was making with the previous post -- instead of using INSTR at all, which requires you to SEARCH the string for your words, store the length in front of each entry instead.

wordlist$ = "4bill3cat3dog7dogfood4pete5steve"

here you read 4, then get 4 characters for a string -- "bill"
then read 3, get 3 characters for "cat"...
then read 3, get 4 characters for "dog" -- MATCH!

Your method works like this:

wordlist$ = ",bill,cat,dog,dogfood,pete,steve,"
a = INSTR(wordlist$,",dog,")

byte 1 -- ",bill" <> ",dog"
byte 2 -- "bill," <> ",dog,"
byte 3 -- "ill,c" <> ",dog,"
byte 4 -- "ll,ca" <> ",dog,"
... and on byte by byte until ",dog," = ",dog,"

That's a 33 character string, and ",dog," is 5 characters, so there's up to 28 checks that the computer is doing behind the scene to look for the first occurrence of your string...

compared to a max of 6 comparisons if we simple read each word all at once. 

When dealing with a process which may be repeated in multiple loops, over and over, it ends up making a difference in performance.  ;)
Title: Re: Binary Search Method
Post by: Pete on January 25, 2019, 02:12:07 pm
But that's what seed in INSTR() is for, so you skip from delimiter to delimiter.

See the same post and the second code box, which uses seed% to deal with lookup frequency within a random word list.

Pete
Title: Re: Binary Search Method
Post by: SMcNeill on January 25, 2019, 07:26:23 pm
But that's what seed in INSTR() is for, so you skip from delimiter to delimiter.

See the same post and the second code box, which uses seed% to deal with lookup frequency within a random word list.

Pete

Seed gives you a starting point, but you still check byte by byte for a match...

a$ = “,1,2,3,4,5,3,7,8,”

seed = INSTR(seed + 1, a$, “,3,”)

Run first, it does a compare at byte 1, then byte 2, then 3... at byte 5 it finds “,3,” — 5 comparisons.

Then you have:

seed = seed + LEN(“,3,”)  ‘5 + 3
seed = INSTR(seed, a$, “,3,”)

Now you start at byte 8 and search byte by byte until you get to byte 11... 5 more comparisons.

INSTR by its nature is a byte by byte compare routine (it has to be, or else how would it know where the search string appears at?  A random guess?)



wordlist$ = "4bill3cat3dog7dogfood4pete5steve0”

Placing the size first lets you advance by words, not by bytes

See the difference, and why one is faster than the other?
Title: Re: Binary Search Method
Post by: STxAxTIC on January 26, 2019, 06:00:08 pm
For some fun trivia, the post (I think three) above this one reminds me of the heartbleed bug:

https://xkcd.com/1354/ (https://xkcd.com/1354/)
Title: Re: Binary Search Method
Post by: Pete on January 27, 2019, 12:51:11 pm
But that's what seed in INSTR() is for, so you skip from delimiter to delimiter.

See the same post and the second code box, which uses seed% to deal with lookup frequency within a random word list.

Pete

Seed gives you a starting point, but you still check byte by byte for a match...

a$ = “,1,2,3,4,5,3,7,8,”

seed = INSTR(seed + 1, a$, “,3,”)

Run first, it does a compare at byte 1, then byte 2, then 3... at byte 5 it finds “,3,” — 5 comparisons.

Then you have:

seed = seed + LEN(“,3,”)  ‘5 + 3
seed = INSTR(seed, a$, “,3,”)

Now you start at byte 8 and search byte by byte until you get to byte 11... 5 more comparisons.

INSTR by its nature is a byte by byte compare routine (it has to be, or else how would it know where the search string appears at?  A random guess?)



wordlist$ = "4bill3cat3dog7dogfood4pete5steve0”

Placing the size first lets you advance by words, not by bytes

See the difference, and why one is faster than the other?

So you are proposing a frequency search be done something like this?

Code: QB64: [Select]
  1. s$ = "4bill3cat3dog7dogfood4pete5steve4bill3cat3dog7dogfood4pete5steve4bill3cat3dog7dogfood4pete5steve0"
  2.  
  3. ' Find the whole word dog and frequency.
  4. FOR i = 1 TO LEN(s$)
  5.     index = VAL(MID$(s$, i, 1)) ' Good for words 9 letters or less.
  6.     x$ = MID$(s$, i + 1, index)
  7.     i = i + index
  8.     IF x$ = "dog" THEN PRINT x$
Title: Re: Binary Search Method
Post by: SMcNeill on January 27, 2019, 01:12:26 pm
So you are proposing a frequency search be done something like this?

Code: QB64: [Select]
  1. s$ = "4bill3cat3dog7dogfood4pete5steve4bill3cat3dog7dogfood4pete5steve4bill3cat3dog7dogfood4pete5steve0"
  2.  
  3. ' Find the whole word dog and frequency.
  4. FOR i = 1 TO LEN(s$)
  5.     index = VAL(MID$(s$, i, 1)) ' Good for words 9 letters or less.
  6.     x$ = MID$(s$, i + 1, index)
  7.     i = i + index
  8.     IF x$ = "dog" THEN PRINT x$

Aye.  No need to use INSTR for your search (it's inherently slower by design); just read each word and compare directly.

The only real difference I'd use would be ASCII characters instead of numeric  ones, as we can deal and process those much faster.

s$ = CHR$(4) + "bill" + CHR$(3) + "cat" + CHR$(3) + "dog" + ....  (so on)

Then just use index = ASC(s$, i).

Gives return values up to 255 characters in length, and processes faster than your VAL(MID$())
Title: Re: Binary Search Method
Post by: Pete on January 27, 2019, 01:31:03 pm
Well I use ASCII characters in files as delimiters all the time, but in this case there are a few considerations.

1) ASCII characters can't be letters so there is no way to separate them from the letters in the word list. Of course if the word list is limited in size, at least the extended ASCII character set could be used for words up to 128 characters. Ah, the good ol' days when supercalifragilisticexpialidocious was the longest word in the English language. Now it's: pneumonoultramicroscopicsilicovolcanoconiosis for all you sandblasting fans. So it looks like going with the extended character set would be fine.

2) Characters like CHR$(7) do not get printed in a string, so a workaround is needed in these cases. Again, I'd go with the extended set and just subtract 127.

Pete
Title: Re: Binary Search Method
Post by: SMcNeill on January 27, 2019, 01:48:46 pm
I guess that all depends on what you need to do with your list... 

If you need to print it, then use _CONTROLCHR OFF:

_CONTROLCHR OFF
FOR I = 0 TO 255: PRINT CHR$(I): NEXT



Normally, though, the size would be something you hide from your user, so they never see it at all.

To build s$, you’d often do something like:

DATA bill, cat, dog

DO
   READ temp$
   s$ = CHR$(LEN(temp$)) + temp$
LOOP

s$ = s$ + CHR$(0) ‘to designate end of list



And to print it, you’d print it much like you did the 3 “dog” above, though you might want to add a nice comma between words just to make your list look pretty.  ;)
Title: Re: Binary Search Method
Post by: Pete on January 27, 2019, 02:06:53 pm
So your method would be...

Code: QB64: [Select]
  1.  ' Convert data format first...
  2. s$ = "4bill3cat3dog7dogfood4pete5steve4bill3cat3dog7dogfood4pete5steve4bill3cat3dog7dogfood4pete5steve0"
  3. 'convert format...
  4. FOR i = 1 TO LEN(s$)
  5.     index = VAL(MID$(s$, i, 1))
  6.     x$ = MID$(s$, i + 1, index)
  7.     i = i + index
  8.     news$ = news$ + CHR$(index) + x$
  9. s$ = news$ + CHR$(0)
  10.  
  11. ' Now your part..
  12. FOR i = 1 TO LEN(s$)
  13.     index = ASC(s$, i) ' Good for words 9 letters or less.
  14.     x$ = MID$(s$, i + 1, index)
  15.     i = i + index
  16.     IF x$ = "dog" THEN PRINT CHR$(index), x$
  17.  

Look right?

Note: The board wouldn't allow pasted ascii characters for the index in the code. It dropped them out of the s$ variable so I re-posted with a conversion method.

Pete
Title: Re: Binary Search Method
Post by: SMcNeill on January 27, 2019, 02:27:37 pm
Looks good.  As you can see in the post under Ed’s, it’s almost exactly what I did for the hash table — speeding it up 300% + on my PC.  https://www.qb64.org/forum/index.php?topic=1003.msg102044#msg102044

The longer the average words in your list, the greater the difference you’ll see in speed vs using INSTR as you were before. 
Title: Re: Binary Search Method
Post by: Pete on January 27, 2019, 04:25:10 pm
OK, I like that method because it's clever, and I learned something new about a QB64 underscore keyword I haven't used before but as for speed, we need to work on that, because I get the INSTR() method as faster when a test is set up as follows...

Code: QB64: [Select]
  1. WIDTH 120, 25
  2. s$ = "4bill3cat3dog7dogfood4pete5steve4bill3cat3dog7dogfood4pete5steve4bill3cat3dog7dogfood4pete5steve0"
  3. 'convert format...
  4. FOR i = 1 TO LEN(s$)
  5.     index = VAL(MID$(s$, i, 1))
  6.     x$ = MID$(s$, i + 1, index)
  7.     i = i + index
  8.     news$ = news$ + CHR$(index) + x$
  9.  
  10. FOR i = 1 TO 15 ' Concatenate to a string size of 3178497
  11.     news$ = news$ + news$
  12. s$ = news$: news$ = ""
  13.  
  14. PRINT " Start timer...": PRINT
  15. t# = TIMER
  16. FOR i = 1 TO LEN(s$)
  17.     index = ASC(s$, i) ' Good for words 9 letters or less.
  18.     x$ = MID$(s$, i + 1, index)
  19.     i = i + index
  20.     IF x$ = "dog" THEN k = k + 1
  21. PRINT TIMER - t#;
  22. PRINT "seconds to find frequency of dog in search INDEX method. Frequency ="; k; "Length of string ="; LEN(s$)
  23.  
  24. k = 0: _DELAY 1: PRINT
  25.  
  26. news$ = ",bill,cat,dog,dogfood,pete,steve,bill,cat,dog,dogfood,pete,steve,bill,cat,dog,dogfood,pete,steve,"
  27. FOR i = 1 TO 15
  28.     news$ = news$ + news$
  29. s$ = news$: news$ = ""
  30.  
  31. PRINT " Start timer...": PRINT
  32. t# = TIMER
  33.     seed& = INSTR(seed&, s$, ",dog,")
  34.     IF seed& = 0 THEN EXIT DO
  35.     x$ = MID$(s$, seed& + 1, LEN("dog"))
  36.     k = k + 1
  37.     seed& = seed& + LEN(",dog,")
  38.  
  39. PRINT TIMER - t#;
  40. PRINT "seconds to find frequency of dog in search INSTR method. Frequency ="; k; "Length of string ="; LEN(s$)
  41.  

My belief is that seed advances INSTR past the byte by byte evaluation, much like your indexing method, but faster. I know we need to make sure the comparison is fair, so if you see anything that is skewing the results, let's work it out.

Pete
Title: Re: Binary Search Method
Post by: SMcNeill on January 28, 2019, 03:13:08 am
I had to do a little pondering to sort out what the heck is going on to make INSTR faster than jumping with the index and size, and what I've came up with is the same old conclusion I've gathered in the past:  There's a lot of overhead in QB64 functions.

    x$ = MID$(s$, i + 1, index)    <--MID$ is slower than one would like.

    index = ASC(s$, i) ' Good for words 9 letters or less.   <--ASC is faster than MID$, but still nothing to write home about.

    IF x$ = "dog" THEN k = k + 1  <--And then you do a direct string compare....

VS:

   seed& = INSTR(seed&, s$, ",dog,")  <--Check the result
    IF seed& = 0 THEN EXIT DO
    x$ = MID$(s$, seed& + 1, LEN("dog"))  <--WTH is this line in here for?  It actually does NOTHING for the check...
    k = k + 1
    seed& = seed& + LEN(",dog,")

So, for the comparison to be as fair as possible, we need to remove as much overhead from the program as we can:

Code: QB64: [Select]
  1.  SCREEN _NEWIMAGE(800, 600, 32)
  2.  
  3. DEFLNG A-Z
  4.  
  5. CONST Limit = 100 'number of words in our list
  6. CONST Repitition = 10000 'number of times to search the lists
  7.  
  8. DIM word(Limit) AS STRING, index AS _UNSIGNED _BYTE
  9.  
  10.  
  11. 'make an array of the proper sizes becauseQB64 dosesn't
  12. 'understand (AS STRING * variable) for a data type
  13. 'so we're setting an array for strings of size 1  to 20, just as a quick placeholder for use with mem.
  14. DIM Strings(1 TO 20) AS STRING
  15. FOR i = 1 TO 20: Strings(i) = SPACE$(i): NEXT
  16.  
  17.  
  18.  
  19. FOR SearchSize = 1 TO 20 STEP 5 'run the list multiple times for various size search strings
  20.  
  21.     'Generate suitable search string
  22.     search1$ = STRING$(SearchSize, "A")
  23.     search2$ = "," + search1$ + "," 'comma before and after
  24.  
  25.     FOR i = 1 TO Limit: word(i) = "": NEXT 'reset old list
  26.  
  27.  
  28.     FOR WordNumber = 1 TO Limit 'the number of words
  29.         IF WordNumber MOD 10 = 1 THEN 'every 10th word, no matter what, is one we want to look for
  30.             word(WordNumber) = search1$
  31.         ELSE 'otherwise, make the word junk
  32.             FOR i = 1 TO INT(RND * 15) + 1 'up to 15 characters of junk in the spam "words"
  33.                 word(WordNumber) = word(WordNumber) + CHR$(INT(RND * 26) + 97)
  34.             NEXT
  35.         END IF
  36.     NEXT
  37.  
  38.     'Words are now generated.  Now let's form our two similar lists for searching.
  39.     list1$ = "": list2$ = ","
  40.  
  41.     'Size/Word list
  42.     FOR i = 1 TO Limit: list1$ = list1$ + CHR$(LEN(word(i))) + word(i): NEXT
  43.  
  44.     'Comma Delimited list
  45.     FOR i = 1 TO Limit: list2$ = list2$ + word(i) + ",": NEXT
  46.  
  47.     'Wordlists are now built.
  48.  
  49.     IF _MEMEXISTS(m) THEN _MEMFREE m 'free the old mem block if we need to, from the previous run
  50.     m = _MEMNEW(LEN(list1$)): _MEMPUT m, m.OFFSET, list1$
  51.  
  52.     template$ = "##.### seconds to find frequency of " + search1$ + "."
  53.     template$ = template$ + "  Frequency = ###"
  54.  
  55.  
  56.     _DELAY 1 'a delay so we can watch the tests
  57.     PRINT
  58.     PRINT "Running Speed Tests on "; search1$
  59.     PRINT "SIZE/WORD: ";
  60.  
  61.     t# = TIMER(0.0001)
  62.     FOR z = 1 TO Repitition
  63.         k = 0: l = LEN(search1$): i = 1
  64.         DO UNTIL i > LEN(list1$)
  65.             $CHECKING:OFF
  66.             index = _MEMGET(m, m.OFFSET + i - 1, _UNSIGNED _BYTE) 'Get the index directly, skip ASC function call
  67.             IF index = l THEN 'if lengths don't even match, we don't need to compare words
  68.                 'just jump to the next one
  69.                 _MEMGET m, m.OFFSET + i, Strings(index) 'get the string direct from memory (like MID$ in Pete's demo)
  70.                 IF Strings(index) = search1$ THEN k = k + 1
  71.             END IF
  72.             $CHECKING:ON
  73.             i = i + index + 1
  74.         LOOP
  75.     NEXT
  76.     t1# = TIMER(0.0001) - t#
  77.     PRINT USING template$; t1#, k
  78.  
  79.     PRINT "INSTR:";
  80.     t# = TIMER(0.0001)
  81.     FOR z = 1 TO Repitition
  82.         k = 0
  83.         DO
  84.             seed& = INSTR(seed&, list2$, search2$)
  85.             IF seed& = 0 THEN EXIT DO
  86.             k = k + 1
  87.             seed& = seed& + LEN(search2$)
  88.         LOOP
  89.     NEXT
  90.     t1# = TIMER(0.0001) - t#
  91.     PRINT USING template$; t1#, k

Things are running so quickly here, we're having to run a search on a list of 100 words, 10000 times, to generate any significant times for comparison.

CONST Limit = 100 'number of words in our list
CONST Repitition = 10000 'number of times to search the lists

I was curious if the length of the search$ made any real difference, and it doesn't really seem to affect much from my testing. 



Notable changes in this and your routine:

            index = _MEMGET(m, m.OFFSET + i - 1, _UNSIGNED _BYTE) 'Get the index directly, skip ASC function call
            IF index = l THEN 'if lengths don't even match, we don't need to compare words
                'just jump to the next one
                _MEMGET m, m.OFFSET + i, Strings(index) 'get the string direct from memory (like MID$ in Pete's demo)
                IF Strings(index) = search1$ THEN k = k + 1
            END IF

We strip out the use of ASC, MID$, and don't even bother to get the word if the two lengths don't match...

And, to keep it fair:

        DO
            seed& = INSTR(seed&, list2$, search2$)
            IF seed& = 0 THEN EXIT DO
            k = k + 1
            seed& = seed& + LEN(search2$)
        LOOP

I basically stripped out that extra line where you were calculating x$ for some odd reason with your search...


Times on my PC are basically 0.015 seconds for SIZE/STRING storage and lookup, verses 0.025 seconds for INSTR/DELIMITED storage and lookup.

Logic says jumping and skipping searches would be faster than searching byte by byte, but once we start tacking on the overhead associated with our function calls, the gap closes quickly.  Without using _MEM to replace ASC and MID$, I couldn't seem to top the speed of INSTR.. the overhead was just too great.

Which now leaves me pondering -- why do the changes I made to the hash table in the post after Ed's make it so much quicker than the previous versions on my machine?  I guess that'll be a mystery to sit and study on tomorrow.  For now, the bed is calling my name...
Title: Re: Binary Search Method
Post by: Pete on January 28, 2019, 04:10:22 am
No strange reason for including x$ = MID$(s$, seed& + 1, LEN("dog")) in the INSTR version. I put it there because it existed in your version. I didn't want it to run faster just because it was left out, even though in the INSTR version it doesn't need to be there to count frequency.

What I don't get is Marks test. INSTR shouldn't ever run as slow as he reported.

Pete
Title: Re: Binary Search Method
Post by: STxAxTIC on January 28, 2019, 06:49:26 am
Alright, been a few days since I actually ready at any of these threads, and at a quick glance... sigh...

Steve, did you really insinuate that I'm somehow "obsessive over" ... what was it... linked lists? That kind of jab is SO uninformed and misled, just like much of... well... everything you say. Let me remind everyone else that you had no idea what you were talking about with respect to linked lists until I took it upon myself to school you. Do you remember that from like, just days ago? If yer gonna name-drop, open a new thread and we can hash it out there. No pun intended. Actually pun intended. Something tells me you had no clue about hashing til my demo, either.

Alright, don't mind my injection. Get back to you were discussing here - talking about searching databases using INSTR, rediscovering obvious properties of QB64, etc etc... bravo guys. Keep up this awesomeness.
Title: Re: Binary Search Method
Post by: SMcNeill on January 28, 2019, 08:38:24 am
Steve, did you really insinuate that I'm somehow "obsessive over" ... what was it... linked lists? That kind of jab is SO uninformed and misled, just like much of... well... everything you say. Let me remind everyone else that you had no idea what you were talking about with respect to linked lists until I took it upon myself to school you.

Sure, I remember.  You didn’t like the NAME of a library which links lists together, so you decided to school me over my naming sense.  Programmers name their code and libraries in a way that they can easily relate to and remember what they are, which is why we don’t see Library Foo1254378 floating around very damn often.  For a set of code that links the behavioral properties of two lists, I still insist “Linked List Library” is fine. 

I wonder if you’re the type to also cry like hell when someone talks about computing on an Apple, claiming, “But that ain’t no red, juicy fruit!”

Quote
Do you remember that from like, just days ago? If yer gonna name-drop, open a new thread and we can hash it out there. No pun intended. Actually pun intended. Something tells me you had no clue about hashing til my demo, either.

Nope.  Never heard of it.  I didn’t know we had hash tables inside QB64.bas, or that idet$ is basically a Linked List.  I didn’t realize that binary search works with odd distribution lists like: “Apple, Bear, Sandcastle, Sasquatch, Sediment, Segment, Sip, Siren, Soap, Soda, Suds, Supper, Zebra, Zoo, Zoology”.  I certainly didn’t know that larger tables resulted in more collisions in a hash table, rendering them less efficient as they grow, or else I never would’ve pushed for bigger datasets to test with — like, “I think the burden's on you to do the speed test m8 (for a large data set with lots of performance demand)”, or  “adjust it for a dataset size typical of a respectable database. A few billion if you can.”

Why, if it wasn’t for you “schooling” me so well, I never could’ve worked for years in the database field and written programs to decode various formats to make them available in QB64; never could’ve written my own database library; and never could’ve gotten paid for hashing out real world solutions to billion record issues... 

Quote
Alright, don't mind my injection. Get back to you were discussing here - talking about searching databases using INSTR, rediscovering obvious properties of QB64, etc etc... bravo guys. Keep up this awesomeness.

Thanks for your permission.  You don’t know how close I was to giving it all up and learning physics, before you gave me permission to carry on.  I truly appreciate it.
Title: Re: Binary Search Method
Post by: Pete on January 28, 2019, 10:58:30 am
Bill, I was going to expand my obvious discussions of QB64 INSTR() usage into theoretical ways it can be used to formulate words. Is it OK with you if I call that STRING$ Theory?

Pete :D
Title: Re: Binary Search Method
Post by: Dimster on January 28, 2019, 11:42:03 am
I have number of data bases which are just numbers. At present when searching I convert them to strings. I gather neither Hashing or Binary offer a way to search without the conversion step?
Title: Re: Binary Search Method
Post by: bplus on January 28, 2019, 11:47:15 am
No strange reason for including x$ = MID$(s$, seed& + 1, LEN("dog")) in the INSTR version. I put it there because it existed in your version. I didn't want it to run faster just because it was left out, even though in the INSTR version it doesn't need to be there to count frequency.

What I don't get is Marks test. INSTR shouldn't ever run as slow as he reported.

Pete

Well it did. Though both methods were flawed in my test, I think what the results point to is still valid ie INSTR is great for short strings but when the strings get to a certain length the Binary Search method will come into it's own as the hash method would with truly large amounts of data.

Pete, I suspect you haven't run INSTR search on nearly half a million word string, with 50,000 lookups in a row.

PS the time was not really that bad except when comparing to Binary Search Method.
Title: Re: Binary Search Method
Post by: Pete on January 28, 2019, 12:55:27 pm
Well you need to run my long string INSTR() test then, because it is fast. That's why seeding was made part of the keyword. Without seeding, I agree, INSTR is too slow, as in your example. With seeding, I can't see any reason to jump to a more complicated way of constructing data to parse it. Too many collisions with hash tables and more and more difficulty trying to figure out ways around that. Words over 9-characters and Steve's indexing method has to be adjusted to read and interpret more than one number representing ascii character. That's not hard, but takes a bit more processing time to achieve.

Anyway, let's just look at that simple INSTR() method, without seeding, for a one time place lookup with 50 loops and that long-ascii string I made up for the seeding example.

Code: QB64: [Select]
  1. WIDTH 80, 55
  2.  
  3. s1$ = "4bill3cat3dog7dogfood4pete5steve4bill3cat3dog7dogfood4pete5steve4bill3cat3dog7dogfood4pete5steve0"
  4. 'convert format...
  5. FOR i = 1 TO LEN(s1$)
  6.     index = VAL(MID$(s1$, i, 1))
  7.     x$ = MID$(s1$, i + 1, index)
  8.     i = i + index
  9.     news$ = news$ + CHR$(index) + x$
  10.  
  11. FOR i = 1 TO 15 ' Concatenate to a string size of 3178497
  12.     news$ = news$ + news$
  13.  
  14. s1$ = news$ + CHR$(1) + "*": news$ = ""
  15.  
  16. news$ = ",bill,cat,dog,dogfood,pete,steve,bill,cat,dog,dogfood,pete,steve,bill,cat,dog,dogfood,pete,steve,"
  17. FOR i = 1 TO 15
  18.     news$ = news$ + news$
  19. s2$ = news$ + "*,": news$ = ""
  20.  
  21. word$ = "*": x = 1
  22. LOCATE , x: PRINT "INDEX Search"
  23. t# = TIMER
  24. FOR j = 1 TO 50
  25.     FOR i = 1 TO LEN(s1$)
  26.         index = ASC(s1$, i) ' Good for words 9 letters or less.
  27.         x$ = MID$(s1$, i + 1, index)
  28.         IF x$ = word$ THEN EXIT FOR
  29.         i = i + index
  30.     NEXT
  31.     LOCATE , x: PRINT j, i
  32. PRINT USING "###.### seconds lookup"; TIMER - t#
  33.  
  34. word$ = ",*,": x = 40
  35. LOCATE 1, x: PRINT "INSTR() Search"
  36. t# = TIMER
  37. FOR i = 1 TO 50
  38.     place = INSTR(s2$, word$)
  39.     LOCATE , x: PRINT i, place
  40. LOCATE , x: PRINT USING "###.### seconds lookup"; TIMER - t#
  41.  
  42.  

INSTR() won that footrace, which is why I'm finding it hard to understand how it failed so miserably in your use of it. I mean it appears here that INSTR(), however it reads the string, reads it faster than skipping by an index and looking for a word match. Now if Steve's function was applied to it, maybe it would be different, but this is why I was asking Steve if the FOR/NEXT index example I came up with would be a fair comparison.

Pete
 
Title: Re: Binary Search Method
Post by: SMcNeill on January 28, 2019, 01:12:24 pm
Well it did. Though both methods were flawed in my test, I think what the results point to is still valid ie INSTR is great for short strings but when the strings get to a certain length the Binary Search method will come into it's own as the hash method would with truly large amounts of data.

Pete, I suspect you haven't run INSTR search on nearly half a million word string, with 50,000 lookups in a row.

PS the time was not really that bad except when comparing to Binary Search Method.

All 3 methods are valid tools for a programmer to use. 

INSTR is the quickest and simplest to implement, and like a Bubblesort, is more than sufficient for many applications.  Its ease of implementation makes it a desirable work horse.



Binary Search works great for organized lists of data, or indexed data.  In many cases, it’s easier to use than a Hash Table, and fast enough to run in almost all applications.  The memory requirements are lower and it works perfectly fine with external files — all things to make it a desirable process to learn to implement.

To see the usefulness of indexes with Binary Search, think of of database with 3 fields — Name, Age, Phone.  Fill it with 10,000,000 records...  Now, who’s the youngest person on the list?  Oldest?  Who is listed 1000th alphabetically?

If you maintain indexes of the data, that information is instantly available, even if the data itself is unsorted. 

The DataRecord:
Joe, 86, 555-1234
Pete, 62, 555-2345
Mark, 16, 555-9786

Name Index: 1, 3, 2
Age Index: 3, 2, 1
Phone Index: 1, 2, 3

So youngest is DataRecord(Age(1)).
Oldest is DataRecord(Ubound(Age))
DataRecord(Name(1000)) tells us who is listed 1000th alphabetically.

And for a look up, we just binary search the proper index...  Instead of IF search$ = DataRecord(RecordToCheck), we’d make that condition IF search$ = DataRecord(IndexDesired(RecordToCheck)), in our binary search routine.

Indexing and Binary Search make for a powerful organizational toolset.



Hash Tables are, of course, the fastest method for looking things up (provided there’s not a ton of collisions to process.). Pete’s old spell check routine is actually a form of a hash table; he just didn’t know it.  He was hashing by the first and then second letters of his word, similar to:

Hash = 100* (ASC(word,1) - 64) + _CEIL((ASC(word,2) -64) / 13)

A word starting with AA would hash to 100 * 1 + _CEIL(1 / 13) = 101
A word starting with AM would hash to 100 * 1 + _CEIL(13 / 13) = 101
A word starting with AN would hash to 100.* 1 + _CEIL(14 / 13) = 102

Letters from Aa-Am in one list, An to Az in the next...

Pete was using a form of a hash table; he just wasn’t using a very efficient one.  (Which is why I said a simple binary search would have probably been better in his case.)

A good hash table has almost instant look ups, but it also requires more memory.  It can require finding the best formula for data distribution and lookup, but it’s still an invaluable tool for programming.

All have their merits and drawbacks.  You just need to choose which suits your current needs the most.
Title: Re: Binary Search Method
Post by: Pete on January 28, 2019, 03:22:05 pm
So we are finally in agreement that my way is the best way! :D :D :D

Actually I did realize my old form of alphabetizing with folders was a type of hash process, but I always considered it a bastardized one. What I wanted was a hash method without collisions, but after a brief look at permutations, I punted on that idea.  Numbers way too big for words with over 20 characters.

QBasic ran interpreted, which slows things down. QuickBASIC was faster and you could use integers and include/exclude libraries to speed things up more but I don't think I ever got a WP in QB that could outdo Windows Word for speed of spell checking. I assumed those developers came up with a hash table loaded into memory, but I could be wrong. Whatever they used, it was damn fast.

Pete   
Title: Re: Binary Search Method
Post by: SMcNeill on January 30, 2019, 03:01:29 pm
I have number of data bases which are just numbers. At present when searching I convert them to strings. I gather neither Hashing or Binary offer a way to search without the conversion step?

Sorry Dimster; in all the brouhaha, I somehow missed your post earlier.  There’s no reason at all to convert them.  Just search them as they are.  (I’ll post a demo soon for you.)
Title: Re: Binary Search Method
Post by: SMcNeill on January 30, 2019, 03:38:51 pm
Using the SET Library I just posted here ( https://www.qb64.org/forum/index.php?topic=1015.0 ), we see a quick demo of using a binary search method on numbers, without needing to convert them to strings:

Code: QB64: [Select]
  1. '$INCLUDE:'SET.BI'
  2.  
  3. DIM NumberArray(0 TO 100) AS LONG
  4. DIM m AS _MEM: m = _MEM(NumberArray())
  5.  
  6.  
  7. FOR i = 0 TO 100
  8.     NumberArray(i) = i 'toss a set of sorted numbers into the array
  9.     'For ease of reference, the numbers are going to start out being the exact same as the array index.
  10. PRINT "Values = Index"
  11. PRINT "The number 12 is in the "; BinaryMemSearch(12, m); " spot."
  12. PRINT "The number 37 is in the "; BinaryMemSearch(37, m); " spot."
  13.  
  14. 'And now to multiply those values by 100 so we can see we're not just parroting the index:
  15.  
  16. FOR i = 0 TO 100
  17.     NumberArray(i) = i * 100 'toss a set of sorted numbers into the array
  18.     'For ease of reference, the numbers are going to start out being the exact same as the array index.
  19. PRINT "Values = Index * 100"
  20. PRINT "The number 1200 is in the "; BinaryMemSearch(1200, m); " spot."
  21. PRINT "The number 3700 is in the "; BinaryMemSearch(3700, m); " spot."
  22. PRINT "PRESS <ANY KEY> TO CONTINUE"
  23.  
  24.  
  25.     SLEEP
  26.     CLS
  27.     'And now to randomize some numbers
  28.     DO
  29.         WeHave50Somewhere = 0
  30.         FOR i = 1 TO 20
  31.             r = INT(RND * 100)
  32.             NumberArray(i) = r
  33.             IF r = 50 THEN WeHave50Somewhere = -1
  34.         NEXT
  35.     LOOP UNTIL WeHave50Somewhere
  36.  
  37.     MemSortSome m, 1, 20 'sort those 20 numbers so we can search them
  38.     PRINT "Index", "Number"
  39.     FOR i = 1 TO 20
  40.         PRINT "      "; i, NumberArray(i)
  41.     NEXT
  42.     index = BinaryMemSearchSome(50, m, 1, 20) 'find 50 amongst the records from 1 to 20
  43.     LOCATE index + 1, 1: PRINT "HERE=>"
  44.     LOCATE 23, 1
  45.     PRINT "PRESS <ANY KEY> TO SEARCH A NEW DATASET, OR <ESC> TO QUIT>"
  46.  
  47. '$INCLUDE:'SET.BM'

Now this makes use of a _MEM routine which I wrote to do the binary searching for us, but that's just so I can have one routine work properly with all data types, and utilize the speed of the _MEM commands.

If you look into the SET.BM file, you can see the actual subs there, and in this case what we're doing is basically:

Code: QB64: [Select]
  1.         CASE 4 'LONG
  2.             DO
  3.                 gap = (max + min) \ 2: IF oldgap = gap THEN gap = gap + 1
  4.                 IF search > _MEMGET(m, m.OFFSET + gap * m.ELEMENTSIZE, LONG) THEN
  5.                     min = gap
  6.                 ELSEIF search < _MEMGET(m, m.OFFSET + gap * m.ELEMENTSIZE, LONG) THEN
  7.                     max = gap
  8.                 ELSE
  9.                     BinaryMemSearchSome = gap
  10.                     EXIT FUNCTION
  11.                 END IF
  12.                 oldgap = gap: IF max - min <= 1 THEN BinaryMemSearchSome = -gap: found = -1 'it's not in the list
  13.             LOOP UNTIL found

It's just a simple binary compare of the array, comparing our search value vs the actual value.  There's no actual reason to need to convert it to a string, at all. (In fact, we're not actually converting it here.  We're just referencing it via the _MEM offsets.)

Title: Re: Binary Search Method
Post by: RonB on January 30, 2019, 07:48:16 pm
The absolute fastest TLU is a "direct" lookup - one where the argument IS the entry number (or index) of the entry, itself. For example, if you needed to validate the 5-digit U.S. zip code for 360 million address records, you could first construct a table (ZipCodeTable) in which there is an array entry for every possible zip code, whether valid or not. That's 100000 entries (numbered from zero to 99999). The entry for each VALID zip code will contain the number '1' (for valid). The entry for each INVALID zip code will contain the number '0' (for invalid). This TLU table requires 100000 entries (which, I believe, means that QB64 will generate code to maintain an array index table containing pointers for each of those 100000 entries). Processing is simple (assuming that you have already verified that the RecordZipCode was 5 numeric digits):
     zipcode& = VAL(RecordZipCode)
     IF ZipCodeTable(zipcode&) = '1' THEN process as valid ELSE process as invalid END IF.

Alternatively, you could create a single string of 100000 bytes, with each corresponding byte set to either 1 or 0. This TLU "table" (really a byte-string) requires 100000 bytes (no QB64 generated array index table). Processing is simple:
     zipcode& = VAL(RecordZipCode)
     IF MID$(ZipCodeTable,zipcode&,1) = '1' THEN process as valid ELSE process as invalid END IF.

I'm not sure whether QB64 would support a bitstring of 100000 bits; and while the "table" would take only 12500 bytes, it may take a lot longer to obtain the result.

Always trade offs. Bottom line: pick the TLU method that works best based on THE ACTUAL (or expected) DATA, depending on your requirements - array size, speed, simplicity, reliability, data placement (file, memory, web), etc. One size does NOT fit all.
Title: Re: Binary Search Method
Post by: codeguy on May 20, 2020, 05:40:09 pm
Another super useful data structure is a binary tree, in this example, implemented as an array. It includes searching, traversal and even presents output in sorted order. Two caveats: for sorted input or monotonically repeating datasets, the performance is terrible for reasons I haven't patience to discuss.
Code: [Select]
'treesortI64.bas
OPTION _EXPLICIT
TYPE minmaxrec
    min AS _INTEGER64
    max AS _INTEGER64
END TYPE
'**************************************
'* anyone claiming you need c/c++ to implement trees is telling you CRAP
'* This is a bit more complex than the standard non-copying version, but it is still
'* respectably fast. General complexity for TreeSort() is O(NLogN), EXCEPT when
'* presented with elements already sorted. One way to avoid this is to KnuthShuffle
'* the input first. Skipped in this implementation, but there is no reason you
'* can't do it prior to TreeSort(). Code modified/added from my repository. This
'* version allows multiple same-value nodes
'* Modified/added 26 March 2018.
'* Further modifed for _INTEGER64 sized indexes 20 May 2020
'**************************************
SUB TreeSortUsingBST (CGSortLibArr() AS DOUBLE, start AS _INTEGER64, finish AS _INTEGER64, order AS INTEGER)
    DIM TSAmmrec AS minmaxrec
    'GetMinMaxArray CGSortLibArr(), start, finish, TSAmmrec
    'if  CGSortLibArr(TSAmmrec.max) = CGSortLibArr(TSAmmrec.min) then exit sub
    'IF delta# = 0 THEN 'already sorted because they're all equal
    '    EXIT SUB
    'END IF
    DIM NilValue: NilValue = LBOUND(CGSortLibArr) - 1
    DIM x AS _INTEGER64
    DIM pointer AS _INTEGER64
    DIM free: free = 2
    TYPE TreeNode
        value AS DOUBLE
        left AS _INTEGER64
        right AS _INTEGER64
    END TYPE
    DIM tree(start + 1 TO finish + 1) AS TreeNode
    FOR x = start + 1 TO finish + 1
        tree(x).value = 0
        tree(x).left = NilValue
        tree(x).right = NilValue
    NEXT
    tree(1).value = CGSortLibArr(1 - 1)
    IF order = 1 THEN
        FOR x = 2 TO finish
            pointer = 1
            DO
                IF CGSortLibArr(x - 1) < tree(pointer).value THEN
                    IF tree(pointer).left = NilValue THEN
                        tree(pointer).left = free
                        tree(free).value = CGSortLibArr(x - 1)
                        free = free + 1
                        EXIT DO
                    ELSE
                        pointer = tree(pointer).left
                    END IF
                ELSE
                    IF tree(pointer).right = NilValue THEN
                        tree(pointer).right = free
                        tree(free).value = CGSortLibArr(x - 1)
                        free = free + 1
                        EXIT DO
                    ELSE
                        pointer = tree(pointer).right
                    END IF
                END IF
            LOOP
        NEXT x
    ELSE
        FOR x = 2 TO finish
            pointer = 1
            DO
                IF CGSortLibArr(x - 1) > tree(pointer).value THEN
                    IF tree(pointer).left = NilValue THEN
                        tree(pointer).left = free
                        tree(free).value = CGSortLibArr(x - 1)
                        free = free + 1
                        EXIT DO
                    ELSE
                        pointer = tree(pointer).left
                    END IF
                ELSE
                    IF tree(pointer).right = NilValue THEN
                        tree(pointer).right = free
                        tree(free).value = CGSortLibArr(x - 1)
                        free = free + 1
                        EXIT DO
                    ELSE
                        pointer = tree(pointer).right
                    END IF
                END IF
            LOOP
        NEXT x
    END IF
    DIM depth AS _INTEGER64
    depth = start + 1
    Traverse_tree CGSortLibArr(), start + 1, depth, tree(), NilValue
    ERASE tree
END SUB

SUB Traverse_tree (CGSortLibArr() AS DOUBLE, NextPtr&, depth&, tree() AS TreeNode, NilValue&)
    IF tree(NextPtr&).left <> NilValue& THEN
        Traverse_tree CGSortLibArr(), tree(NextPtr&).left, depth&, tree(), NilValue&
    END IF
    CGSortLibArr(depth& - 1) = tree(NextPtr&).value
    depth& = depth& + 1
    IF tree(NextPtr&).right <> NilValue& THEN Traverse_tree CGSortLibArr(), tree(NextPtr&).right, depth&, tree(), NilValue&
END SUB
Title: Re: Binary Search Method
Post by: codeguy on June 17, 2020, 01:59:03 am
One caveat: BINARY search as it's coded fails on descending lists, SO a minor change is necessary. Presented is code that works successfully for ascending and descending order.
Code: [Select]
'BinarySearchTestI64
OPTION _EXPLICIT
CONST OrderDescending = -1
CONST OrderAscending = 1
CONST OrderMonotonic = 0

DIM LowerBoundTestArry AS _UNSIGNED _INTEGER64
DIM UpperBoundTestArry AS _UNSIGNED _INTEGER64: UpperBoundTestArry = 0
DIM MainI AS _UNSIGNED _INTEGER64: MainI = LowerBoundTestArry
DIM MainWhere AS _UNSIGNED _INTEGER64: MainWhere = LowerBoundTestArry
DIM MaiTestStart AS SINGLE
DIM MainTestEnd AS SINGLE
DIM mainOrderchoice AS INTEGER: mainOrderchoice = OrderDescending
DIM MainRepPercentage AS DOUBLE: MainRepPercentage = 0
DO
    UpperBoundTestArry = UpperBoundTestArry + 1
    REDIM CgSortLbAr(LowerBoundTestArry TO UpperBoundTestArry) AS DOUBLE
    FillArray CgSortLbAr(), LowerBoundTestArry, UpperBoundTestArry, mainOrderchoice, MainRepPercentage
    MaiTestStart = TIMER(.001)
    FOR MainI = LowerBoundTestArry TO UpperBoundTestArry
        cgBinarysearch CgSortLbAr(), LowerBoundTestArry, UpperBoundTestArry, CgSortLbAr(MainI), MainWhere
        IF CgSortLbAr(MainI) = CgSortLbAr(MainWhere) THEN
        ELSE
            STOP
        END IF
    NEXT
    MainTestEnd = TIMER(.001)
    PRINT UpperBoundTestArry - LowerBoundTestArry; MainTestEnd - MaiTestStart
    UpperBoundTestArry = UpperBoundTestArry * 2
LOOP UNTIL UpperBoundTestArry > 67108863
PRINT UpperBoundTestArry

SUB FillArray (CgSortLibArr() AS DOUBLE, start AS _UNSIGNED _INTEGER64, finish AS _UNSIGNED _INTEGER64, OrderChoice AS INTEGER, Repetitions AS DOUBLE)
    DIM FillArray_RepeatStart AS _UNSIGNED _INTEGER64
    IF OrderChoice = OrderMonotonic THEN
        FOR FillArray_RepeatStart = start TO finish
            CgSortLibArr(FillArray_RepeatStart) = 0
        NEXT
    ELSE
        IF Repetitions > 0 THEN
            IF Repetitions < 100 THEN
                FillArray_RepeatStart = finish - INT((Repetitions / 100) * (finish - start))
            ELSE
                FillArray_RepeatStart = finish
            END IF

            DIM FillArray_u AS _UNSIGNED _INTEGER64: FillArray_u = start
            DO
                IF FillArray_u > finish - 1 THEN
                    EXIT DO
                ELSE
                    IF FillArray_u > FillArray_RepeatStart THEN
                        CgSortLibArr(FillArray_u) = CgSortLibArr(FillArray_RepeatStart)
                    END IF
                END IF
                FillArray_u = FillArray_u + 1
            LOOP
        ELSE
            SELECT CASE OrderChoice
                CASE OrderDescending
                    FOR FillArray_RepeatStart = start TO finish
                        CgSortLibArr(FillArray_RepeatStart) = (finish - FillArray_RepeatStart) / (finish - start + 1)
                    NEXT
                CASE OrderAscending
                    FOR FillArray_RepeatStart = start TO finish
                        CgSortLibArr(FillArray_RepeatStart) = (FillArray_RepeatStart) / (finish - start + 1)
                    NEXT

                CASE ELSE
                    FOR FillArray_RepeatStart = start TO finish
                        CgSortLibArr(FillArray_RepeatStart) = 0
                    NEXT
            END SELECT
        END IF
    END IF
END SUB

SUB cgBinarysearch (CgSortLibArr() AS DOUBLE, start AS _UNSIGNED _INTEGER64, finish AS _UNSIGNED _INTEGER64, what AS DOUBLE, where AS _UNSIGNED _INTEGER64)
    DIM BinarySearch_low AS _INTEGER64
    DIM BinarySearch_high AS _INTEGER64
    SELECT CASE CgSortLibArr(start)
        CASE IS <= CgSortLibArr(finish)
            BinarySearch_low = start
            BinarySearch_high = finish
            DO
                where = BinarySearch_low + (BinarySearch_high - BinarySearch_low) \ 2
                IF CgSortLibArr(where) < what THEN
                    BinarySearch_low = where + 1
                ELSEIF CgSortLibArr(where) > what THEN
                    BinarySearch_high = where - 1
                ELSE
                    EXIT SUB
                END IF
            LOOP WHILE BinarySearch_low < BinarySearch_high
            where = BinarySearch_low
        CASE ELSE
            BinarySearch_low = finish
            BinarySearch_high = start
            DO
                where = BinarySearch_high + (BinarySearch_low - BinarySearch_high) \ 2
                IF CgSortLibArr(where) > what THEN
                    BinarySearch_high = where + 1
                ELSEIF CgSortLibArr(where) < what THEN
                    BinarySearch_low = where - 1
                ELSE
                    EXIT SUB
                END IF
            LOOP WHILE BinarySearch_low > BinarySearch_high
            where = BinarySearch_high
    END SELECT
END SUB
Title: Re: Binary Search Method
Post by: Dav on June 19, 2020, 06:47:15 pm
As always that some nice coding, @codeguy

By the way....I just remembered something.  Yes, this code it's off topic, but so what.  Have a good one!

- Dav

Code: QB64: [Select]
  1. DIM SHARED HBC&, HBCM$
  2.  
  3. :::::::::::::::::
  4. :::: HBCDATA::::: 'Make HBC variable data
  5. :::::::::::::::::
  6.  
  7. SCREEN _NEWIMAGE(600, 600, 32): CLS , _RGB(255, 255, 255)
  8.  
  9. FOR y = _HEIGHT TO 0 STEP -1
  10.     _PUTIMAGE (0, y)-(_WIDTH, y + _HEIGHT), HBC&: _DISPLAY
  11.     IF y = _HEIGHT THEN PLAY HBCM$
  12.     _LIMIT 60
  13. FOR c = 0 TO -300 STEP -1
  14.     _PUTIMAGE (x - c, y - c)-(_WIDTH + c, _HEIGHT + c), HBC&: _DISPLAY
  15.     _LIMIT 120
  16.  
  17. :::::::::::::::::
  18. :::: SYSTEM::::::
  19. :::::::::::::::::
  20.  
  21.  
  22. SUB HBCDATA
  23.     v& = _NEWIMAGE(246, 264, 32)
  24.     DIM m AS _MEM: m = _MEMIMAGE(v&)
  25.     A$ = A$ + "haIkM^6cfUMFGoU_#81nVVa3QH`h7j78#DX29D]MB`VXQ?IRT0Y=6>h<Ej87"
  26.     A$ = A$ + "JSU4JRUQDBKZdF5]cdQ8PU48E>Zb1[1JXSYUFjdQJ[Ha#DB78N8`7D3D>HWf"
  27.     A$ = A$ + "ifWOkgWO?_oOG_[mN_fggkk3?gg[[T[LOL_g[mJOmojdjJ]FKfdY>eY>eY>e"
  28.     A$ = A$ + "Y>MGjWhWHcV7nQg\iaOlSM;YCWJdWhCL\K1CC_[geV=OOOOKflkmkM\K9kGj"
  29.     A$ = A$ + "iN^Um\PokM^cmhWnYgO]]>EWhI6mkb]B8;S<mcl<kWfeSmHgEVh[i[Y]o;o_"
  30.     A$ = A$ + "39n2<]b]]d6_]Bo2o2gkn\EK_87eae7NRWEO3O3Kfl#?dmgo3cgm4?ad7?hW"
  31.     A$ = A$ + "lH1O_fDfVVR#FKYbM[1mjNMgkjbk?GY\o5oZKQ:aeMJo#HCN:\L=N:\MYOFL"
  32.     A$ = A$ + "^GC:?oOaOaCoOiggWjG6S#6g[;f^?GYln\EL=?7lHnRnRf_]_;E2<mDHIjgi"
  33.     A$ = A$ + "i5LnoVkiQhYi`MK3UjkWc7gSUoNYlnYNN0f5:_?a^#;dfHS_C;ShIA9Fn[h["
  34.     A$ = A$ + "h^fH;S7U?gZoCLlk[L6]4O[cfkQ<njd?QcI:eegJL>Y<gBR=2mSmHaKSBm]l"
  35.     A$ = A$ + "<JZlLaHCTHoS5Q=jF\1Wi^j#S_f5kAXoQk;lUjDWf6oQBI^iRASn2`nYnSFm"
  36.     A$ = A$ + ";H:bl1_[man\jDCOCZkLZlkTHjS]_U43`HHEjW1?WoWmYoNVgOdeTS7kf83R"
  37.     A$ = A$ + ">EdGV]kCI\MINDJeOXF_ohlUnmT<bSKKK^\?NK>GeNFAKM=l94M4kjiI>ORh"
  38.     A$ = A$ + "loene^>mYeXUN?#oY]S?o?okZ_Wf=g7S93\_`e8SD;g4I^`Ke?cBmV[]\2dJ"
  39.     A$ = A$ + "8_DNnFJn_07gR?DUHJ`#ia>ecC>FdcR<ANo>FO<L=oD]=`chd_R<>`BNZWg>"
  40.     A$ = A$ + "N[CY;ckiMM\ACOR\=am3bVlKUfmVc_9>7LL]T3LegEVK=\oU?C=_MIKM>Vn5"
  41.     A$ = A$ + "bIoLnR`cCJc;MLf;adLn]_2m=;agdDNcaA<UMVkLL8UGB:ScI>mVjgK63eD7"
  42.     A$ = A$ + "CJ;4k:#InMV2oTiKga09U5d?ZD6[VOEe^o:]aacllIeHjgJ9W#I]0\[fQZQ["
  43.     A$ = A$ + ";m_8ITkhoEC_O[jRQZU3^YlkZ5Vk5N>ZlMJ?Y\_T[efjcKihc[o;ic5iiRG9"
  44.     A$ = A$ + "moPcYoodVoDTeWcQBN9YDFUfNZo]V=QJjN[A9N:mG9_?6c^AYnFJSiaAKbnh"
  45.     A$ = A$ + "ijGbW1PkCM3L=ba_<aAceV6c6BNmfeaT=kW;S=L:^VoQ9>J^ag:o_fW^D<=m"
  46.     A$ = A$ + "edg>Vm\lJTb1SioL[ScL=K9UG_F`Td>BIo<nVF6W`BK772iUTBK_K3?FLOic"
  47.     A$ = A$ + "VD_Hf_GkIN]W[l<e_[DV=aeYmJd_GZc?SGUcCY=0O>FnlZd?QYbgL:;\[iEY"
  48.     A$ = A$ + "VLMFCnNnieblBFN?\TaO:o_8gUfMhl>V<JfNWkN?m7CiPba[OYSCCjc<?7Bi"
  49.     A$ = A$ + "TBoEV2GG:3Vf^BKhSIO]4COXTGBZFLAe\K3VPnFh\]D[oN<IX\?Y6^VW;edE"
  50.     A$ = A$ + "keSO[lN?egai_F^gU=f:n]BkIIKC<D]lc=U]SlL\[elAYLH;a2FV3dal5I>o"
  51.     A$ = A$ + "?CmO9nKZl4U]GjOWSBI4?WY?18;Ta3<W?ZUiMYdG`bnVa`eLLUjFbkiDNX\="
  52.     A$ = A$ + "E;nQ3UlB9UGoF7OTJamUDR3cnTBOibS=]9Rn^JhbbfGZOWncba#Q_[FL4e^?"
  53.     A$ = A$ + "cl4ai=U];O>?6Unn\[eWHYOS]Z;O\W=im#[W3>6kGZY;L\fK;a9F6SJjC#jc"
  54.     A$ = A$ + "K;bmiaFcnHVKPYZ9UJiT=k3[5S^4?OlLO<TG:_OWCoE=:kK:K?S=7B\_?aIl"
  55.     A$ = A$ + "NT7Z5CdDm_BiaUiiTc[][BoAZEkdYOCelM?ON?Fn4:mmJG7cRd>iBNfTm_f7"
  56.     A$ = A$ + "\4kN#:GVbW]T[m\?K>bW;jGOicobfB]n07cTBIUbh4B<MZoaWASMNBMIY=TB"
  57.     A$ = A$ + "oEAV:kcBKD7:iUTbkaFlMB:K?Un7VjQe6CR;`FiHadW?Cc__DGM]a]]4?;Uh"
  58.     A$ = A$ + "JeSKmAPlgHi5To[b796[Do2g?UnQ\6SGAih`e2EF3G:[dZ]D^G:eeWmKCAic"
  59.     A$ = A$ + "dFR5>aen<Xfe9OnUhe<W;C9[Tb9UgOd>iIKNNZECeYmN^?[iC_H8n]SQlBBK"
  60.     A$ = A$ + "C=NGnL_4GGVCi4C[oEeb?BImDTG3]E`cbJj4ZQW[eNhlDjcF9ccbBONcn9O6"
  61.     A$ = A$ + "Dj[Nb[Q?EUgWcAUmYLO:=7^>eCFjWPhQiXU>7Ylo>E>?P:7oaBo_hhZEo>YO"
  62.     A$ = A$ + "a`YOJSUKOj3:7o]BOh]=FcgOo]SQlBBieIKWOVUhjDN<ko:m_bLhbmDiccDO"
  63.     A$ = A$ + "XnADkIaDhIYbl=>V>nYV74eR[Z\MTn__6nDE:o>5EJ3X<oFUiP;o]<W4R]:k"
  64.     A$ = A$ + "KJQVBVXTZiC`DSiiHmk^N<D63#INC;Um:m[9WgjieSnUdojBmKLNhJEnm73i"
  65.     A$ = A$ + "UTBInFaeUm9U?3[M?]dJK\VmUbhbKE7G9f`h0:[CRiXJnm3SNkLLAF[lJVjL"
  66.     A$ = A$ + "]>IbLDhH3CL?Unn<EN1Wb^W7Gnm]>VZ;1GGJKH>olAn9em#OCjCN=O<bW?e8"
  67.     A$ = A$ + "[m^JnWEJ3IKZGd3]lBBie[eJU]<^_J?3Y?7\SaR\DZfLhM:kkCA]Jc[5bh0#"
  68.     A$ = A$ + "N0lTb3Y^^eZf2lI_aXD[^::ia`dii[faImm:UoF[jmbcmLhjBK3kjlDY\N9D"
  69.     A$ = A$ + "gef^>?Y?0kjJggQDN9YD7N[?ofWSiVdJ]nhDj3gnRfVnaUBeTMCl<b=cY_Sf"
  70.     A$ = A$ + "VeF^b\c=VO]BV7UFNN]dLWg2YNO`9Wj[E2kKi5e;]DmJFWhk;J]`e;M<Eg6:"
  71.     A$ = A$ + "SMMOYWC2iE`19^L]G7SP#^HOR=X?3kY[ilQjf2M8UG>eXiZkUFY<^[m5^nH]"
  72.     A$ = A$ + "fkeYKWdU\lbJQ[?4[ECInl>emo[C7OjBFNIKbWG=JK67_U#Uca\>eYYX;MiU"
  73.     A$ = A$ + "U<V9CA[UMoaXFGWRjDWP^dUGLLJgeeO_LnQ]?FWXcJ`iBcWZ>]LZ;_\Ndn<g"
  74.     A$ = A$ + "^Kkh9dY;CZ;_dY>eY>eY>eY>eY>eY>eY>eY>eY>eY>eY>eY>eY>eY>eY>eY>"
  75.     A$ = A$ + "eY>eYePHNif[]`>eYc7:W[mkSjH_CMZC7NjBN^f_=dnNmLYCMJ=X4G_?F7k>"
  76.     A$ = A$ + "WXLocJ\eJd>eYC1:WWhkce=^c1:G3PGkm9S>eYeTBLmUfjhgBYeJ]dYCMJOC"
  77.     A$ = A$ + "7Re2ic5J]F;MjDWFKRLnhj6mBg[;UL_3lBJNjG^O4eY>U4h1`#;Mo522l4if"
  78.     A$ = A$ + "2ldBGCg;g7]JM?h]T:gkeQGbmRg3^>T?eNCF;Tk=Bef3RGC:G?h_dFk]kdh4"
  79.     A$ = A$ + "b^Uk_e;aFGini6L[[;oLMZ]OMTkKg]4_hHk3M]^nFBkXfN3l^\_KD^7Z^[4j"
  80.     A$ = A$ + "M[=^0ih6L9^^oeYjD9VN9hQaf7e0[>Wm^L_aR[GJS>aecEC:iNi3_26=UeWS"
  81.     A$ = A$ + "b[[km?C53PkgkC=NB;9>2_nCmobkWYf[Fg7kc9Mjf7Ub;h3gBFK8cmjH>Flg"
  82.     A$ = A$ + "\V]GgOehJ1ckCo^iHdgM[KdlgW;?#imPj04K>WnYL][^fNWI]H0be5gaJKiN"
  83.     A$ = A$ + "8gDf#KM_BKZe=oef_P>MkVbl]hNlL^o1>5^6<L=L9DiN?oLLY?oIKXUah:]i"
  84.     A$ = A$ + "3Uf_Wb7nbm\_lnMZH0bSI\fFj3dH]Qb[oHnlW?WZYWZEo2jdU1UfMBOMA>L<"
  85.     A$ = A$ + "O<K9Fl<?>V;=c55LNle^>UkcbCAYmFmkgSOZL816>]1Vif:]QFR9[]G5?EKJ"
  86.     A$ = A$ + ":KjUgWSU_aLoD[V]oN\eMBZdFM;DRIVbFD:kG:CG6;LYm^Ja[?5UnPCLZ9VN"
  87.     A$ = A$ + "<KYUkieUkc2Ik_ln;aHeRk_fnfK=LMimiDke3c]gE=WO1MjbQBicFbiKYm=m"
  88.     A$ = A$ + "9\6f9?gUbQC=G>:7K[Fl3?S]DldB`delg8_fU]a<WJUgKYN_<_OUhjJj^6;g"
  89.     A$ = A$ + "PIObHnOXo?milFWBIgiT76K\XZIO0IKaoeR5dO[dFM6_^k3kniabaK]ld1VI"
  90.     A$ = A$ + "<ZfhKGREbL1FkNLZLDUhelnAoS=?jUjQ6c7llLHL6MZCCA]>W::cGN6KL]S;"
  91.     A$ = A$ + "omBm5IL[I\P9N#LiDaabi]<N0aBe8`CUi^_\M0V_fhme2^6<IN>]FHBL;7Gn"
  92.     A$ = A$ + "id>LY>2>OeJ;;]NOjdU7UbWe\7R=jDfBlfHj3:U][5_HJ3VSU[AN<UfJConi"
  93.     A$ = A$ + "oUGKo^dnG9nWfDI=c0f?K7L=bOWgWWcY6[HRO9cmHim#ZC<?ofOGZCR[EiiK"
  94.     A$ = A$ + ">O`AG9m;mH[k4D:K[\6bE8ODJ;<]3VbMlohocgUHSaR_]VLnDfJ;c[LfNE>^"
  95.     A$ = A$ + "lo0>5ilbL1ThYaJ7LO#kN:OIbn]B^F=[R_5X;XFMY>eh1JMm<VO3#UiXKOG_"
  96.     A$ = A$ + "ZMjdWZUgfJf8:c=4f0Wjh#FM\LFEkHa^nDSC>fd5G`[PCJ97je`[im^Sm6WC"
  97.     A$ = A$ + "`?Y?^YN]JDY?bcM?<6EKl5bfKZ3]fi?SEI\oCW^lXJihF\iDiB[lh\NbJA^B"
  98.     A$ = A$ + "OSQGCi`Bo<hmdVfVHBCKVSAdf1G]6c;LlS`L5V]gbj]YTD_#joBWjTTbXmi]"
  99.     A$ = A$ + "km?^n7cjgZej#]CMjBWdokY\aVhYSY=`UDWnMZCGbTaSR_[SiON6kla2?]=e"
  100.     A$ = A$ + "2HWjdU<I<h>O<C\KF78]>?Eg7DV_nMInOgY>MY#SU3jBnH^6:UiT\?OZkDWJ"
  101.     A$ = A$ + "S:73YF67j35UeTGg7l>eYUC>VK>oAgfa5K]KCmego>eYc?Z_G3fY>eY>eY>e"
  102.     A$ = A$ + "Y>eY>eY>eY>eY>eY>eY>eY>eY>eY>eY>eY>eY>eY>eY>eY>eY>eY>eY;1jWk"
  103.     A$ = A$ + "gioiNiLnDoZOd=_egc_lVOfO]oSKn0oAo9EoOoWNQ7TGS[iCldO`mbmEWjdY"
  104.     A$ = A$ + ">QL?HPGo?noV1l`JBLNoFnnoG\i[k=nF6H>oO\?cWm^lEHg7Pmg2N9g7Ln1;"
  105.     A$ = A$ + "k[L?][hJ>nmAOCWjD[TbP_jWl7L#>\5IJnON<NL[IK1\=_Wc=HLNOE<mE<fc"
  106.     A$ = A$ + "ijcoUSM<kkeXgaCo\373hI>>eG1_]]o_gO^OU1mAo=Oe_^]nldY>][4bNR9D"
  107.     A$ = A$ + "6O:BIO]aaaPob[19^F\=HJaZ?Pmj[n\oo47]4LMJGUS36L9OgBYDGW]k>eYS"
  108.     A$ = A$ + "11ND\<bh:W?5>EoEcooLf[GROklo4Wb[3fZ[h3NJKUf3h9S7_eH]cf_nT_=n"
  109.     A$ = A$ + "?kaUjjjfZk4bD8CXoWK3QlD[nA;9FSS1l`?`kogLBiJaBROdGii\akeYelBa"
  110.     A$ = A$ + "o3Lm=aFG9VJmmfcW_GKk;Y?#oEffno<^LeE<GOBW>o9aeKS\UfI<O^;ihe_A"
  111.     A$ = A$ + "RcdH=iea>NKWZ70l=nL>U<\n4X_^l:GSJWKaHHWn[n[oENS=Knn4OWaD[^\]"
  112.     A$ = A$ + "9nJ_^j;ofVW0kgEGG7Gg98a;R?GB>I1gX\]HSFU?iH=N#HR]L:BM7N]4;=EK"
  113.     A$ = A$ + "=m;eS_fa`e6Ll;o[me>PYUo;lGiU?lmI^c0oX>9ea#kW?g:^F<Xj7hJ\]i:`"
  114.     A$ = A$ + "k9miZ5OHjdi?9N<aV]:GXn0ckSHWUNm=OEH;5<6nTKkA<XHdUV;LUe=OBL^;"
  115.     A$ = A$ + "9nN`FPUO9o=nj^Qo[mboK=lMPgBL=ggHSefRhbF`eL^lJUjBhm`Z_H9mON?2"
  116.     A$ = A$ + "gS]^C#ILW;9FKT?mh`?Eae;aN_fMeWG]kYo_e8aD]:oJ^]WB70oNJW>]Gc[d"
  117.     A$ = A$ + ">o_lI^KmWTnRKkFMK=eNN7o#3WG>^GmCmOH#?fKjWoI7dg`m>ofSm?n]=iiX"
  118.     A$ = A$ + "L<ihmL?dYcG:mG]eoOVCFhil7<cQUHjL\XKiJWiNFo`e>]nVGkl#kSO[ELM6"
  119.     A$ = A$ + "ojH];J3RQUaFm3mWiQ6No_a_ec]iiNn=3HKagL^6`VGkoM[i;3<;G;`dLo2V"
  120.     A$ = A$ + "FL=]En=`ncM?Qo4YMNJ;Mo_?OY<o_]P_BlL[eaH>fAJOGK]]4S87_H9>>U[Q"
  121.     A$ = A$ + "aGL[i[\]TnBX=mY^N_kgg7m6oLCOL;9`\PQo2OAO93hTGa[h^<_W_kANTGgV"
  122.     A$ = A$ + "OW?i_m0^FFmH>Nc;a_iGm[ikJhlnHoSng=P]UY?Pk9n]oYoho[WmL#k3M4`R"
  123.     A$ = A$ + "[mn\S_?=Y<^bUBVOD]KG;o_e^N8;\4iQL<GW;^^oh:SY;bQP[eNEjCZbX9V5"
  124.     A$ = A$ + "IMTUAV6\6o^ao2f\T`V7fdNd7iOiVGc[iIflBOYo37n_PgCRSmUl`?lV_R_h"
  125.     A$ = A$ + "oRKnNonn2_QOTGi;I`N=WW?eOhO`Vo_Oj=3l_oWjNS56]Oc]F;n?Pn1JkL<o"
  126.     A$ = A$ + "3NCo`?0^VfkGnGf;Kccn\S7c0m9LOoml]n:flOmGj9gl9Ooo[6hOnOagk06W"
  127.     A$ = A$ + "kUilS_C7NZ\^QGjaVS[3fo0[dJmIVi5O9f^clHEN<PY_7oI^1SWnS:^>]KQ<"
  128.     A$ = A$ + "NZW1L=O7_nclCmC_i7k9oA6\[bg1fdc;G3`QOg?nS?PAnQO\ok3<hKn_8oW]"
  129.     A$ = A$ + "]?n7kS_iUmGo;MP;aeoUocnWOhooWLU_gbL^EWTn2X]N`]CAlkfVC<]jdhL2"
  130.     A$ = A$ + "V]VN:YghgcS?X3jWm]o]?PW1K3CkTkI`ecI__C7NBLM>?OFB\bnocHNWcGa<"
  131.     A$ = A$ + "g;YoLFO6CML83InMAVTS?9adOf[2olColOj1\1OVSCoA5G[]K\U96F2L<WO\"
  132.     A$ = A$ + "ObkmcPC0_`iV[1<H_gfKh=_i;m?gOk=O;OcOX1Lmgf;nKM0WSlOZWimo1n03"
  133.     A$ = A$ + "hTGi;m_jmQ[1WkiVf<fYiEn<HW1MBG[?cHcW2?2a_KlkUjbXOh?eOjGdTaFc"
  134.     A$ = A$ + "aCKRfjkiMomO3^FKflKdW`mDW>]8mQ^\N5WS<N?[AQ\VS7;7D#V7Z4;[\gPM"
  135.     A$ = A$ + "U:aT:aaQL4n6Sl8_Z=akIoLcVGhJLmWmi_V_jmlO0GS=_BoA1G3FAL=H:`Y_"
  136.     A$ = A$ + "hoBOUKNk_UWJ`N=_2>5kGPSOQ[_>`lInM`eJ_6o`iho#_o?l0n4m0loXM0^^"
  137.     A$ = A$ + "dN=O6KfXoP[777ffimd^M\]X]VaQ?5aaS>1^WCL=nEQo4LNBo9:9^fdF;m17"
  138.     A$ = A$ + "6L>]iibiFW>lD>6C>N:]F7dV?e\VV5G?E^BiH:7O7`Z6k8HE9m55B\3bYJkP"
  139.     A$ = A$ + "?S=7>6Tae>=d0fnJL=H:\YPL<h]BKGUhJS;5L9hK^>l:GK`MlkH;3L3aV2fP"
  140.     A$ = A$ + "OWc=f[O[_nOZ1<^h?lG5KKXGPHX_cM^cV?W[h?_?g_PkSigTiclO^ceo?>G>"
  141.     A$ = A$ + ">gLN0?aeVk;`Od6hmdVX]AKDMId6nUN_o^3?KlNVf:W7noSAd^CO`C6\>hjY"
  142.     A$ = A$ + "R=_C7N:7CbQa=Y0G?gLd97khL>7iL9N<kePg5;W<O7bI6_7Hd<>C`VJoh]mZ"
  143.     A$ = A$ + "N]gHcP?[];`d_`ef]e7IH>Nc3=f^n9O_og6``bR[ilP>2`PR_0G0nVf6j#\="
  144.     A$ = A$ + "b[`lKlmb?d3mE=lJRGmk5GBKhKj[oeOce36m2Z31V_So3\7GN^]MUG;N?O7j"
  145.     A$ = A$ + "OX=b[L?`aR^6^?1Kcm<W;mk_6agciH0knSnMn0fZYOWSO^H1jdQU<oCUnA;^"
  146.     A$ = A$ + "^<o`UTnCWe2M]a;Zfeead1i>]IS?X8[X_QJkE6l:f6dOEU_8f_KT?OQ?m0ka"
  147.     A$ = A$ + "PodHW7I?>WgLM_B>7LLR7eN9_go3h>`EP3CF\IR;1c8C>jlmRC1GEbh;<\hG"
  148.     A$ = A$ + "In>o??eolO]=oTo0oF3_jaHKH8^U[^fN=`n:<];m?Pk5H_?imN_Co1m;Hc^4"
  149.     A$ = A$ + "OKl2dGGj3>fZ97OL=kdY7UcOP4Gg:n<[gJcoUeM<o6f5#Vaa\1VaS5IA`#JW"
  150.     A$ = A$ + "6I7H`\>F?R==oDPGAf3L<o?Te4GcaJ^\07[_a63>G7a^l:be8cSN5aXRa0k0"
  151.     A$ = A$ + ">:I`D9^B\NF3XjO\aCPm#h4CEJ7GO\Co_e7LIaU96eoShd1Kc5n1X>7e]`[f"
  152.     A$ = A$ + "f;]mc[dGP^0bO0ncSn1^>oX_Y_^k3CCo>miPiki;kdT\6\be#SL=iY5:WGcV"
  153.     A$ = A$ + "7<Uicl<MglBoI6`fPi#VY<gA`jOWhdK`eK^[]G\E`o1<=<HJ^>h3[HHUo5Ca"
  154.     A$ = A$ + "_?T_<bgVf0_bGl4c2WbnN?06]llUfnd_LI[;[TcL4Jo[ea:mMI<1bFo7eZgd"
  155.     A$ = A$ + "\^#\nC]MV]_dn<];^_Zic0aJCn0BMAZSQWO9^VH]aF=OOW>=9W[#Yl6b9l:b"
  156.     A$ = A$ + "E:[FFK5e8m_6I86S7TI#FRhm`nYa93^6KhP=;[=3c;<O?HO`aX;`h[1[Sl4b"
  157.     A$ = A$ + "KPW5S[=W1K`Q_YH_6fhEa6FJW]<FHa1gP5^>>1>Fj?ll1^cikI[<joFja\]\"
  158.     A$ = A$ + "i7alWkmLN_2>6l\HmdW4n>cWPmeX35KelnNLeWnTc]HU3\^7\^VdGI`DP97S"
  159.     A$ = A$ + "\^?#6baj1i3T=070fTLCkHbbi5mnUhJSc6\Xa6CnbdF<7CVoHd3TfQBK]jGL"
  160.     A$ = A$ + "Y?`Uaf?DcW4WL\69=5feh>bicm]16MBlL2mK9>gljYO=RciIPe]>mk6c3??j"
  161.     A$ = A$ + "d];2ILNnR]#lkBkS6cKj?=<H?`08KPnOadbHG0<3o?aeFOFLN[Q[ikY=PM1]"
  162.     A$ = A$ + "IX?_jWKnZfFDnDoB=VCWSS>g<g5kPL=e^>mE[5V;WC:7B\>fciJCoTi58cO_"
  163.     A$ = A$ + "i0Tkk<goXKX7Om]32oUeFVSA\eO0j]1?J<_FSVlNlIVWe4[F=L=W_QaQji_g"
  164.     A$ = A$ + "HFah0S=0`aUnPWn5KNYe6Si?:c1VhG77Wm9>XLMGI]H^?hNbL?N8afdWIoF="
  165.     A$ = A$ + "L>?kikAo]aGYO?MKgWf4nk2fYF^BLL=6d?OmH9I^W=^GdUWeKU<aJ3^VHZ5G"
  166.     A$ = A$ + "ch?aiRSY<G`ZG9abJ3AL\fPAFk#:oWlJ7_lfj?Qh?jWX?YeSSj8]UoWeV^mm"
  167.     A$ = A$ + "6c3nPI^jVZVejdaU\V3=GciHFnd_[_SKZK9`bP[cJS46OV73GklI`JY6MlI]"
  168.     A$ = A$ + "OPm07kf<?dILa6_O[bS]RU^]5SL9kL8M9n_ClB`;Y_TnNea2kHZiL#IZjF[C"
  169.     A$ = A$ + "7GRlT0>[\NPim>7T6NnMM]OhLI26oZ[i;>O6OWig1KK<c>VYXgGK>I<blMYl"
  170.     A$ = A$ + "dJR5d6Ti7l#Q]MNK\6W;ce]SEfQDGRiH`LOPNgNMROjBh;^imL\J<2K]>VbJ"
  171.     A$ = A$ + "cfH\;m5WJU6k2jOM>NDF?DUa9O8T;=oGLmfflDEj7nLnUkJC2lJdn=6hSAlh"
  172.     A$ = A$ + "9nVW]D7khcEW>M9`KPO=_II=1Km8<TKkX6?]F`6R?o:K]hk=\jd1oJlbI^[e"
  173.     A$ = A$ + "7^MAfBiKdOLXTWM]FcJ8I^a5CfaAKK^VHKfi7c=eMfei\^UhZM<<]F5F[hIh"
  174.     A$ = A$ + "iWi#K^iGFW>N4fS1ojhLUnRK]NRMM\ACncQI>5b[hC]ib6Kg4oUe]\jf=NJn"
  175.     A$ = A$ + "_[QLUaR_VSidL<G[bjIANX>FFAojM>YDVgZ\FaM\mV;?R6oaBcafLm5d7Tc3"
  176.     A$ = A$ + "_>MjAijdSnRk[j;>a:ClDilJbiPXe[\cinb_O]aOHgIKUAe?fFoo^V>06Sh="
  177.     A$ = A$ + "biXQeSYeQ9SkkHW7lmge:\Yd1THJJ[D[8okO7o?^Q]neM\WPiH<fKieDOFY>"
  178.     A$ = A$ + "Daemh[?m9dm2n]LNgJ]1klN2<;fW#nFL\eX\HI[3ce4;?6nFi>`FYOielmeH"
  179.     A$ = A$ + "D;aElmRi07?\n8NmHjX=J`dPUB<6c_IaJdOHNS5_U[U`il6;c_\SA\[]Jj30"
  180.     A$ = A$ + "j<hlVG_a`fI^7UKaFnB`fUhJ^OjSSmY?Q^G`_>GN`N=HJW_EH?7o[M^;WbP>"
  181.     A$ = A$ + "77g5iVDg#;iIbea8TQ7Z3V:nV?gh3kj>HYoeikE?F9V^6[mk1oE:WkDG_n6V"
  182.     A$ = A$ + "eS]ck2c]VmWlKcM]B\=GCeAXojlM[9f>IWk[ma`ndW#g;H:b;^[=MV[6bm5H"
  183.     A$ = A$ + "0aBnmFWe[U\bBbWVno:FAl2OOFSJUcYSQa3ojjUF\Ta_jlfDo\oWm6oe7eVI"
  184.     A$ = A$ + ";h>>nL>EGnOhgCOiiEWcXY?1?c?`WhVk?R7bO_L>UVLRggEKeUOWS]A?WIWn"
  185.     A$ = A$ + "4?SL<XP1SiLRbib\iaaWePeGKK1ehLM=#Fkd^64Tf_d>Xa2khUG>_7d6jfRM"
  186.     A$ = A$ + "G:[OkCIoE<I^f:`m1iT<WGFiL6_6^^D?gfJg^Dg]nFem1odW0CSMPBkcI\Ke"
  187.     A$ = A$ + "lcFoaFbhcZLAJ7]VMJWWail;fH7mmR9;I\32Wh9HamiJY#=<V7Obk:VFKcSM"
  188.     A$ = A$ + "^<FI_7=7M>6Q>N2`>O;mH73G3ce<Si6mK;Ml^bW?n<Y_6WMJCF;Y^VHQ=JNf"
  189.     A$ = A$ + "WSQC]WcI]NfZ\QbF8_R\U[Unil2FKaILcY_fYm\BKebCIC\lc:kW[Y8VOK7O"
  190.     A$ = A$ + "8SoeeHUBOZ[I76OWikClZigScAILiN_JnbEo7hjbl\K<4Io#=>e;AO]GWblk"
  191.     A$ = A$ + "Uc?milKgeMVnLandUL=T7i7RYfeEG`[CU7<[CRFS]N8oY>70_6?3C]Yik;Sc"
  192.     A$ = A$ + "T_CIbYTM[Q[5oZ=jTCkf83[o0>NL^F:H>ZYmkJIRn_DR;EGCYoc^^nJn[Lo`"
  193.     A$ = A$ + ";?NJ?ZK[\>Gbf#bNNXmd:^>agFCPiJVBiW[Q_EGCOl]>]9NnPMJSSN]Z7\BF"
  194.     A$ = A$ + "kE^OgPLEb96gG[9CF3S?6^>c9\n3H=ZUiNVkeiZK<Ze<HnOL\L>ljhhKE6W#"
  195.     A$ = A$ + "6oRS[6_BoKI<m^?QDKL][E[ZXSY<WQ]R[YFO]oEMZIo=_WkYJem3OGOMCidV"
  196.     A$ = A$ + "Lm;aeZ[i\A_;\Sk3fZ;ad9f^LNQUL;hJn<G3a^jGI^GQ`gYMiUDOWP_H]QSS"
  197.     A$ = A$ + "GL=j>dfFiJiX[]D9fdj5OHM?>JgP?Mo1\UfRko7h3#jO#ILeVc<NU?COC9NN"
  198.     A$ = A$ + "<>?Ng[bL]<]CW^4aEK=nJMN^?`d`J;;mo>acY=Sa`c]P[efYkmc[im#=keP["
  199.     A$ = A$ + "M?W=][?eiaHB0CbjX8g363oBdchlQ[L]:cH2bJO9[9EmI:k[Coc;mQ2OMBOK"
  200.     A$ = A$ + "jhjCKRjkfmG2\W>WLTe=eB`26?Fj3NY?gig;kl2;m9O^LnJ\i^Oc]?d?=4gb"
  201.     A$ = A$ + "eSOVi`NYenQf\M\W4g]=S3E^nVhHOkhRagQ^=laPe5U\>N<FjT;eaV\a_H\g"
  202.     A$ = A$ + "h;NO^GOJBj7^[3?R3[9oHNbBKjImKV[MfiWci]#:W8N>c=;b>9VfOSGaoaBk"
  203.     A$ = A$ + "ARUeFTc[QmilGLH=#oZfTeNVfJGJ]ihNiiJ=O<\MIlfjGO=M5>?#:Wo:>^Ji"
  204.     A$ = A$ + "cWJHMWKm]\GQeYS3iJZPSKE[fS5;Ua]UeZUf0DfXdWKUkljiHPinBYhjLmj`"
  205.     A$ = A$ + "JTdaS=[]T3aJS4nPSl\nPkhTebH3]_I[UMc7WS]fBdG<d?G\o4DjWUl`H2n`"
  206.     A$ = A$ + "?lP?OMjdSbm]AdofbcOe1X_K]bYLQfccc[n7R?gXW`HTMn=Ue#M>W9hHI?UK"
  207.     A$ = A$ + "W`5G?NaC]N5`eZL\WW3G_[[E:]jkQeZReJcfP[QceHJm7ScKVo37k>gg3kdY"
  208.     A$ = A$ + "7ac5gkkbW]>V^injDbJKH9HJm]3\K]jGcH5cHUeoF`DJk>ac[9kNT`Ho^[O4"
  209.     A$ = A$ + "iloPg?ehh_;[ghj_L;HCjC4GG6Om^dWH=2Uc;EG[IkaGOJCT;8eobcb<NGcG"
  210.     A$ = A$ + "RSEB^WF=fHFEj7^\edARG;idN\e`BioYEKcTk\biGh^Rk]6_XFB<fjQn^mdJ"
  211.     A$ = A$ + "UF[f[5oiLnd?_[[?:>_c]6cDGO?WhWn4[m_>??\nAMM[L:K_6W\[O7V3[bjb"
  212.     A$ = A$ + "8iUPaFSe\We4GkjSFFKY8S_fSWff`6[RnGE>Gdgen#GcKH]ciSn;ld7KAf>e"
  213.     A$ = A$ + "8YlPb4>g0V2GOcjCH<oQM^:WkiL[9fjDPMmGH^eiGd3<\n<\3[_I;QM=SP[i"
  214.     A$ = A$ + "J6W2??IMdQe=7G;ZaF#g?lCObeekbLL3feji^4OG63l]IFO\GceiCmQMODG^"
  215.     A$ = A$ + "7Bfe7;`e?kCm=O3kNK:[17oOn=kc][CARH\cLL3Ffa^bh\cJkfaJjHRYcj5M"
  216.     A$ = A$ + "H?dkZhYgfjAI\e7YMUGZ>2[1eCUm]0m1o=mZOig7VN<nomoiW^Sa?A8ZQO\Y"
  217.     A$ = A$ + "D>Nc>6cihHVoFihDeiMWM?;n#_?l9G^6:`HNQ]JFK\J1G;g]QOl97kc<NJ]5"
  218.     A$ = A$ + "J\fV7V<OIMLmi5G;?8X_WHdF2V^S_?n4inPLMUe?8\HIO>K>Peglif3;Fcem"
  219.     A$ = A$ + "\MQM_h2fiKUilBON7SY?`eL]ab5hm]VkOMndLMA#>W;gilYn#^Wgg2GFoo^^"
  220.     A$ = A$ + "XD6GM7OOjCD7PTGT\N`\^AbLmhhjTS3mDhF[IYmYLXSSMRE]NYL>Zi[K;OCm"
  221.     A$ = A$ + "D^3WSL?3Efem?GWVE_1n>ijjP_VL;k_8[9>gfN[aGgBlgMj`#8;I\eg<WX^2"
  222.     A$ = A$ + "[GnL>GWa>dig5kNFcIJ;ea[5LgM^cM6HaPlm4G8h7^G43A=Sam;[=<^n`DRM"
  223.     A$ = A$ + "PI]N[TYVK[mNnoNnTijThFJ3`^G:`J#HZcaiEgMVPck]L<5;mWH=gK04KcJD"
  224.     A$ = A$ + "kofo9oMG5\=SCFgfmnWLO]?]?=6^m#__iTeoARQE6G<;h8`CR99GPg6Hd?<\"
  225.     A$ = A$ + "_aOEk6M<\?Xam1jNL_=Cm3b3kMPaJ0m3<G=:l?XLmUHY?3\NDF;L]L7KOH8T"
  226.     A$ = A$ + "c`_L]a];haUH??a_h;Phfdn4]7>W8c3FhHS713:VSfVfDa71\53SMJJ_7bf5"
  227.     A$ = A$ + "jdL_AKX]5[Sjin:N^W7<5N?ae[9V>]MgY33iNEmfPYKIm<DLO^7i8WkcGHkJ"
  228.     A$ = A$ + "]\mZOcK[>1>>G7e1k:>ff:g?Sio\kC:7KMAL?THNJkY]NSk?g3VheQeMjm4^"
  229.     A$ = A$ + "^kGnQShi^c9k]a_]Y\9kndC^?NYocK[\ZagZmCTG=VFSWe_3idUP]iLVfSC6"
  230.     A$ = A$ + "KbL]I>#`nW;WkYR?Vf1oWS=nN\n#_GegN[E>`lkCL=<e]F?WJkGbmcWU>?=Z"
  231.     A$ = A$ + "QUMO2AOZg6\E:oMSoUAlUYOjlKLna^d^h_>GWY`XJ7FLmknmmA7hF>>HJWW:"
  232.     A$ = A$ + "aD<Fo\ScU?k8gI;]nDji;odQ<?J`UhdF`ig\n8M]?N^oQdZ?e6kJY\FJcm]m"
  233.     A$ = A$ + "6Nck5<Q[Oj]P=e?L`fPYMM=T_[eS?ai7KllDm9^g;bnb8aQPmkm56_S]gOTn"
  234.     A$ = A$ + "RWeUbDeQBY]iURUQca?2KQ7jL>AkM9HA^OAVG<]f\Y_H9W7dSL8bQ_c]7S1b"
  235.     A$ = A$ + "eX\UXSTLN`i1\=W7NUafN]ahmlY]O8l]LZak:i4?c[UfJ[aPIih#VZlk>THI"
  236.     A$ = A$ + "TCGRmiJHKZO4GOQ1GSLO[7oQ>NKcMVi0<73MmmXec5WSBlMg^mY=AmUJL[CQ"
  237.     A$ = A$ + "W=GZP=Vbn:fS#>PL=Ym2]gM8U[CiKJ;96cm?Db_gB`fX7cmiWi`dV_I^6Lo_"
  238.     A$ = A$ + "]iI`LLI>1iLjhMcc1dG16]Uc?o7noVc2jGhn8ei>5ceaL[_fhk>fNM9NNYnl"
  239.     A$ = A$ + "a\?KMQ<6N6\<bI>nY8_\Donf7lHa#3^lIOV=3<fM=?k<n^K[=MRM1mIF3XNn"
  240.     A$ = A$ + "QM\UWZ_`JJ[\][meKf3C:bSXifPG5kgBL=I^9]VmJ=fOeMca\FhKbCNO<`FG"
  241.     A$ = A$ + "RWA8KhHIVi7M:kcXkGmg>OP>56oFia`XUh^JlmUCn[aUX_B\Seee#]MoNnUW"
  242.     A$ = A$ + "lLiNWG=OaAoGjS\eC^hF>o4[4W7OegCKTO?7;Je5`WGR>E^GiH5OgZOj<GAh"
  243.     A$ = A$ + "H8OKk:n^S]GGbakba0>c7F=mlJKfS1iYSA=M:fSf3jHb_O<<=O?fXW3G_f<h"
  244.     A$ = A$ + "j<W2dOUfSM<8bJ8QG76?_Wl7ln>G4O?_jn9RhOnN`kVg\]A7;nHiNYlBRORo"
  245.     A$ = A$ + ";i#OGWCHmJGIm8O^Te<N9>U?[ML]=_]eHb^bJGBfh4BI\YRAn#SYeGLjg\=B"
  246.     A$ = A$ + "OWfV;o_h?Q[ciRi7kljJZ6e;3WCadL]f5OW#gXi3_EocY]RMkM]nbki8Og9F"
  247.     A$ = A$ + "GaAFGO^;U5dnLj[mJJKfak2k]IMTUi:biRUejX]3W;5Y_5cE;H<gW>6hJKkY"
  248.     A$ = A$ + "^#n>^?1GFcNL[<m1iL]Tn\ejITkC<]T?OlM2Kf[aHQalh^CK7UHJT_:S7caG"
  249.     A$ = A$ + "N<KhK;ci1IP\F6eGQBL8cYaL^=knl8eD3W;S?efFF?Yel1WkaS1^6oT97TUn"
  250.     A$ = A$ + "hhLfZUHn;Iao3kAfGkcbn8o6]Fon[9^6^C;WHmSE<K=Ma3k_K[PoLUhITS=n"
  251.     A$ = A$ + "Ml=5I1`Y`jW?_J]MULY<]fXX]2>GKRdNIOY56kc7KKe9^^FOY]K[AFnLRcMM"
  252.     A$ = A$ + "<^LMOajXEMK[ELAYnM^6]D?`lkh3>n]_VcOk>e>aLddhY[5k4h?[CYe#FAOi"
  253.     A$ = A$ + "M>4BNT;]?QL?hlF\K1NeH3dn=jY\nCijI]ShnH=g;7;<=<hbYlWUnLS1A<>g"
  254.     A$ = A$ + "SnIcSVc?D\;VcaeJnK]DM^n\bi^Vc]=7gbi`dkSiiIOL]KS`gKc]bH?[`6Jj"
  255.     A$ = A$ + "Kk^8[hHfhndeHhF`PPGW3OHnV\Ng0?[_RVONn?R[e>iadF=<igZeJdCoX0kb"
  256.     A$ = A$ + "L`h]mZN]3mU\oKlok3oA6HN?h9n=nO;Y^cclIXN3gOWhga7PiZG5[a5`cI]T"
  257.     A$ = A$ + "_f[93`mlS??1>`l`>fc<NN][hjLLLZIONKI\?klVTJUCfi5]kagJ[NH]Sh:m"
  258.     A$ = A$ + "9kjeemRk]mhaGJcggDmKLO16i_oGeOWQn#aaSa<V_P]JilF>G\]F01OgZOIP"
  259.     A$ = A$ + "eilXo2ZkI]abMoaG6Q?[8g<E<cjoUhJW30lm]j[Wnb3_FHJJk3i#n:LZi>3V"
  260.     A$ = A$ + "gceAL=fk`>][Y3[ahEknGi^P[WCOZmeXgRniWkmMGkb`ij_6HI]McW1GGjO_"
  261.     A$ = A$ + "e7>G?[YgL<=hj<ecbLL:Ll^03kiQg_?W[GMKf;SLO`^eLQihHjhKUhkLm6[F"
  262.     A$ = A$ + "^deOO]=HL^^F8ImAR<I9fKH<L_jk]6[=6cL<f7b]mcNgi2MI^dmJ>TC]E#WB"
  263.     A$ = A$ + "=L=]AgK;K5GGK\4Cfjn2o[ClTedEnIdOik;?WIlj>Ge7[>bMl9Ln^kLl8SY7"
  264.     A$ = A$ + "6kbPUYfCf7nKg]I_m4?oKaO`B\]_gea0T5bLlDcWmdVQidBM1`^OLkholL?o"
  265.     A$ = A$ + "VKVoS>nGJSciR]W?[5;[_Zbad2IBloO=`d`^^OFnm^_Eg2^fJl6kWSEK^>>4"
  266.     A$ = A$ + "JC4<TaF3KmSBOQeemHj9hoALG`T[DH>NaafoG^N8Z>4`akSiVif`meIUa9di"
  267.     A$ = A$ + "mcnfof7diZNndV[[CAJ?N9nM_=\[;?V3?mU06iFl3]<NBm=Uh?YmK>n:adHS"
  268.     A$ = A$ + "O=[GD\E?6^NZj1[T9ONV_ZJm7>>5P[#FNZH[Y?0\gHm]foZ_dVcjD_Qb1RQ?"
  269.     A$ = A$ + "fhgi`fMjn9ZOWWnMmM\iYonOE3n#ac#cUIZ[fWcn\>GgKAn0>GWLc_ce;QB>"
  270.     A$ = A$ + "S=4l:fP\V3EFCK?R_[QWii:nWBkbleXo7RWQiLWcARiH\[gR_d\^[Gj;^kG]"
  271.     A$ = A$ + ";bWn[`JXgY<GE>o#L>V`mfDhJWc5fgVihfi`ARSClkYRMhUbmh\_7AOacnSn"
  272.     A$ = A$ + "M^i3lD?fVOagagh`[`Pcan=?OiETClI>fd5OK<M9GZg_d6#nnd^P<hI^6L]D"
  273.     A$ = A$ + "n3M7POYm2;N6IF\=iiTS5l]j<hk5GSllBcC7H9fW4J96keHlbbldSOgijhVc"
  274.     A$ = A$ + "k;`VL?WS]E9kHLEnl8kW>Tifj#`mj;mN4nO36n3n>ok]iGmYn6_Qico:?aSM"
  275.     A$ = A$ + "O<hMd10ca<Po_f6_<b9RoG2caaaS^5h1o7^Ra62bPV3HadHO6IKUM=gMY\]h"
  276.     A$ = A$ + "j]5C[mjF7;<gW=f5L=fUcJ=9e=bnRXkIaTC^Y6[;n]dg:jSAgli4>^Kc^>A^"
  277.     A$ = A$ + "64O:^nINb7Mccl4?j3PYCVo;l7nYNdQnB>^?bCm83<OWonH<o7hlHhL`[XWP"
  278.     A$ = A$ + "fSa3P<YnXkHhH^P7\=Omg?V=;]e_]aCS_e\O4fZ]f]=OLDK:^F[`m\H#Soem"
  279.     A$ = A$ + "`HgORAG3OoHf\A?XiXnHSe>TL?oIgeF]HJaelJR[1[i[gP7ObGe0N\\?=]gg"
  280.     A$ = A$ + "2;^fS?an8CKM:I^`<G#Ho2<kLeU1Hn]e>mfb4CmfHcfllS^:RkahP4C[]jL_"
  281.     A$ = A$ + "7gJh\6ffn\mMMQLZaGjSiEV[\4GGj;NJO]4G_c<WW:<?O`N=bT`ho=HDS37i"
  282.     A$ = A$ + "mi`dj;:HR3EMRR?1KKmcah:1V6mIl<1VgkLPAlLR[1[SN[a\Ikh2L^kkMgVm"
  283.     A$ = A$ + "M9R19mo>]EG3O_:HhFaeGKo6iJcKVSm6HlF`cbH;SaoNYSgdfbXoHKm37L]H"
  284.     A$ = A$ + "#N^`[PcMO9>gGb;afSi?^i4oB3GOYVo<]EWa:?6Vn#R[CO2Df6\\Sa6f\K5C"
  285.     A$ = A$ + "W[oh3c_[7j[JWZecFHGCcfVS5m>R1]>2`FLjkMb^?UC\oL_E;^T;Ao`Q_4m5"
  286.     A$ = A$ + "7KeTW9`dPUNVgiS>9Vn#S[YMXO4Inc#V7k?]R[A>Wa3RjXJY[a8;en\k5gkb"
  287.     A$ = A$ + "LHAke>VRlN>Oe`dUhKbW6aJGVOl;alV9OYUGLm1_6^N_QOGPOh9^Ffak2IDl"
  288.     A$ = A$ + "eX5L=fZZ]nP]f<hh]5;CmSjakJSRHJRYN<KeUhJnOh_>gcFg>>>GkjJ5h]En"
  289.     A$ = A$ + "B;6Kc1^h86G;]CN`aeS`X[aanRgCk4<=HkC=LmBe1Tka6>_c1Oc[gDk8GiG="
  290.     A$ = A$ + "nG?7V^<FK>>Rk`H[9nlS=njHaGBnRGJ[FL]HKOOb7M<nCN?L=fOV[^XcL6M\"
  291.     A$ = A$ + "ae<Vd>_cL=oemjNWk:^^6I=UiJBNihJe2KMVR>1cgha6OM<i;5Oa7bGfeemA"
  292.     A$ = A$ + "RY_13?bh>MDafGg686KbMF[Sa5aG36J9e2n^J;Va^fm90]733>Vj;Qf2[c3^"
  293.     A$ = A$ + "^<1>Do`Mo_ciVUcGeUP]=7JL=?4kW]WjlU2^6KMDSVSJKn46GKMZdBmXTi=b"
  294.     A$ = A$ + "ed<RgL=`_>?XBk`ROM]>emeJ`_\fBmIo\KflYOn?m`[lIWWI^NmSMMW;7R[i"
  295.     A$ = A$ + "jP=lY6W[a6g;J;FG^7K\eaT_46_;m17kegDK95f\?UaeHWfLW]TajRSaiAL["
  296.     A$ = A$ + "e3:<fFCl[cSZ43kLo6lhP]hOmoG3hfohGhc\i?ic\I073o2_`mN?o_Qieae["
  297.     A$ = A$ + "UPD;;NnciZ6HK6G?\oJmSf2K]SOZ?_:khjeQ\F`]na^1Go>W1GOTS_F\]SY;"
  298.     A$ = A$ + "i2ZeLWE6_]S5D=OXa7HWS7iNGW[mW^VAPOdPMQoDoQo1KnmoDgeF<_2F6Vg3"
  299.     A$ = A$ + "66B<=o7>6``XG0KePYilR=iaZ7DSeN9i?K`Wnmdj4h]9nB8gIL?>FMQF>?<>"
  300.     A$ = A$ + "ih[`eXCbaa^eLWU<7Skm[9V>WoD6S;OWa23f=]m>7ko5l<<4_biamO4`fPYM"
  301.     A$ = A$ + "MAgmPSZi9H;\IcoM=f?\>7hcIBKeFW9RQ_eP[_R]NYGJmYDRA0KG^^ViJFP["
  302.     A$ = A$ + "]KRU5?R]f1ljc?3oI_O<]<O7[eJI^b0GS>4J?CE7M^f]^DKf3[KaGhi37nL>"
  303.     A$ = A$ + "gIH[VlU9^VkgK[hJW;8L?];HK`DD;J9nfmW0\Yc_9VD<nL\DRWQAgPeZ^nOk"
  304.     A$ = A$ + "n38i#KZe71HccmBb=>gSGRe#J7GOk2GS_hL?\dakJZa0[fhKK^ZMml5<YhCX"
  305.     A$ = A$ + "4o>7QMNHGKfdoKgo20[gBkUeRUUj;>aUkl3kHS]jhjm3ajK2H2_G[QY?eaed"
  306.     A$ = A$ + "^Y=B^P\V;fVlWEjOK=obaF]k;F>>IVW<ag]#^6RjJJ\[aiHgfm99cmmD]EV_"
  307.     A$ = A$ + ":]Vo<[[DG3c^dc;ni:^fe_<`e9>if8^6Vk3WgV;I\\[a^>W3f6LCinY9fE]f"
  308.     A$ = A$ + "`LMF2\=LYOgI\eHSVa^fc=HIn\k[\^FQK\ecQ[aN=Hjal5gifU[AB3aILV]f"
  309.     A$ = A$ + "U]]liJMRW[O23eJm]LL][U:4kh#MZ\Pi^iDf^]>aH>JMO[IB?kM7;:7o;n?^"
  310.     A$ = A$ + "7BRMMbI^[IXlnL]CWGM_o4okM_ePf^[=KC]n3biRH[ca=O<OaikY]iNfea6?"
  311.     A$ = A$ + "MZ`W[SQ]hjL^ND>GW_=Q[9o0XSbjDaacIGaf`POMMhUHO9_eD?HD_9`>_=\6"
  312.     A$ = A$ + "?5[1KMSZ]Mg?N1CWegY[3Q_i_b_Y1lJ]iGYfX1ccaP]J`]eRafib5]k>Nnb1"
  313.     A$ = A$ + "Gkj=j]ML]<e?S[mJJgF\m^4_mmUG][gohL=8JHOlkJ\m7lInSfl;mNoM_1cn"
  314.     A$ = A$ + "FO?ob3hM\cc[?aCo17d50Wn?c[FSWdNObGnSNOf\C\]ie3M0l:HGe=TnRgmi"
  315.     A$ = A$ + "n27GOUm]L>LM[5GK=b5[[RiNDR[MY[1f6faVbiC9nBC>\P1GW<HLgW>MmF#L"
  316.     A$ = A$ + "XHj4G3fEl[f\M]#gl2bmSjF4WkN4>f]MnJO\a>Wbli:^Vj\jL3GW<gGTG#G?"
  317.     A$ = A$ + "2dngJP[a69W?`?I<^PEM>A:cW;m=GOX[Q[aN=hJ]ICk?mm0IB7C?g;SL_7cm"
  318.     A$ = A$ + "WS>Vnb4GWceSBLM=\mA7g^0Oa[ECkL?H]V^6aMkNB9H#HN_hjBlZHFablO6F"
  319.     A$ = A$ + "Ko[H_6LYWC?_lO1GGFCgiNZS[A[mJo^S[MOiYfljhf:^N0KclEiJnnF;C_bf"
  320.     A$ = A$ + "M6gmfjK>ikFOT;aee`dgPI_67_4L=G3e3E3_jgiNZba6[LKR?Wae3[aAElQm"
  321.     A$ = A$ + "f<^>ili8OFMIY]kUR_Mm:?aej;nDhjBkeY_kVCld7P<oIMoZkhjFXdNmU2^>"
  322.     A$ = A$ + "IdWQm=G3UF2nfm]>no96L:o`M_0cj9SSgm`bbmUdJhj^oeMLM;dU2^>_7Bob"
  323.     A$ = A$ + "im^gPjn1HVK=\9GKOg5F\FRodGlBLM^f3ijZ6\eBFJkgiSAiiem8_nhE]^li"
  324.     A$ = A$ + "HmVUhjL]DhB1GkmX[YK^?PjniG]a>fjlbHN;aOj;>_Z_g3e3jEfgM=<J#WaE"
  325.     A$ = A$ + "OVmOXdf?OOiiCffcUNmL_fliHmQWaGO^Ug\4?Gnm^>5K^d4[G^gNkh8EF?Z>"
  326.     A$ = A$ + "G>cH[CL]SeTnEknX3mgPUa70[=DaiedA<V_h7K\`i4gaegn`e]bUf`cmbkdg"
  327.     A$ = A$ + "LafOK_h_e9aQ>Vb^gKka2V6NHOmmJkgj3N=O_[I_^fHMei]W?7Y>^NLLMnNR"
  328.     A$ = A$ + "9>m9Wj<Y5\HRcioZ=JaeHg>?gcMNjaH_^lijjKFR[?WS_NYhjB<>nTcJ#Pa9"
  329.     A$ = A$ + "KMRDc?lY\eVc=;\KckmgVkLXOhh7#?6k>^N::aee63k>^nN[QaFSiDSI>O;J"
  330.     A$ = A$ + "1CG3KSMjL?k^UcUhjaZk\>_LnL<Gh#CQ[_EFOhk2?2VVnUL]J`lQ]DLMR]Mm"
  331.     A$ = A$ + "B8W_F;1GO9__J]VliH>cPL]BYfHHOYQ[cj>eeG2g;Aim67lHe>B[Lj?mBRA_"
  332.     A$ = A$ + "lhjeNf^cW[4[9GP[5K?5VnL7G?6kJiXSi5HjQmYf[l7OK`eRAUGbaUSQF_fc"
  333.     A$ = A$ + "f=Viahi<I]HdaeeIU387dJ_DKVK;^NKIaeZKYS[gNnL=fJ9g_[OPlQoDgbFW"
  334.     A$ = A$ + "QgcHKae3c1[]8_I[1S?kR]kc_jMS?GS]FRjN_JmQgae3\nQWf[g5O`g5Ogcl"
  335.     A$ = A$ + "fakkSSmfaW[S_EBgP[3O>EV^S[_KOBIle;9OG[5chV[mJ[o]N=U^MlihLhZT"
  336.     A$ = A$ + "L_ij6<m5Jn`W2G3WhjaVcEKS=hJOGb^_4il<i>gi>g`>?#jaI_<nLg7L87[["
  337.     A$ = A$ + "KfoNjhjZhJgCodNmf4OM9Nei>BF7i9^]EVcEgWlfh;1O`QjhjUS[7K^L=6>>"
  338.     A$ = A$ + "]g^]HgihnN]EKlUP?hBVCl>^NNLM>?]ZQ]7cWi31gg_]VW_4l1GbH\ciTL7G"
  339.     A$ = A$ + "O?L]i3?m3?S_>mWn#SUkf]KW_T\E3ij=\i5_S[7gN]fY?5``SamJA[>O9I[F"
  340.     A$ = A$ + "jnl5_S[O0\=hJ`;7K<K;LOmBj1i;UlUEBj;N?njjl]9LMgOl7T_4]E3INakh"
  341.     A$ = A$ + "j7TiNgm1XS=NMYnSO\aCW2lUZ]J8`eLokJ0D7GO?<=aWLKbF]L_6b_\]E;Y="
  342.     A$ = A$ + "knlijN<[1aP=>fHdM1KO9k?ninlfZEbiRM7GO?L=SYea6O^;lU\ohGjfZUbi"
  343.     A$ = A$ + "]I7Go8g<nF7K\i^JcnBdOl;]a[N::[A5Ggc^Tae<NA7KLiJ`\7PM\aIM<maU"
  344.     A$ = A$ + "`VmU<^6i2_o_]k3^lUdlenB>ogCAFoI\>LVHkS=Nk#QYY^A9^JWKV7K<iJaG"
  345.     A$ = A$ + ":aIgcEfh4jl\fB;]GO>6g=g7>WfhEGKQ?fHaefVmiNLfM<mdTaIWhJjgdodc"
  346.     A$ = A$ + "EL]HKZ_jc5O`_D\IgSY^=Rn9`fPSe>fU0^>GgS>fh`^=kfHFgPkSYEkD>O#:"
  347.     A$ = A$ + "cSfi>^nHS1gG<nQL\aQM<maU`OLWoVl:eIjiX=KeII\eWSnPWf\?WV_GM<mf"
  348.     A$ = A$ + "AJc6<]LI][L\aU[9fVH[gWeRR[g:<W[8NGHdSL8[oUcQh\a>M??IK?IMSO9P"
  349.     A$ = A$ + "[iNKOj3^[aP\7MB^;Bn#V[^Kke=NggkeQLM6oL6GclHSa]NO=nFLN1?;knMX"
  350.     A$ = A$ + "<HK`J72\m]mlWe]C_N4HKU7MM#k\R_jNJOHcdeC#RKGfaC3>gWENVWhA?HiR"
  351.     A$ = A$ + "ofX_hMkd[?E>7]ceh`S>VL5HdGinecJRO0W>hgmEn2H=l1^FM9UlQbOlKKi5"
  352.     A$ = A$ + "_SYgOD6[][o2CEWYicUS=^M>N][MDme5<=mGY]j1LM5<m#\0Gi?^kgfkCL=i"
  353.     A$ = A$ + "]kHSEKUaNB7C_O9afF?J`eZSlJ?O>fHgi\G_VaFci2L<j;biA3lHfZQYMP^P"
  354.     A$ = A$ + "mMna_]Tk\NMTMh8bK1n59f^4G?fcXS=fM:6\dJio[H6kK\Oinbch8hJno[IK"
  355.     A$ = A$ + "I>o9>fhfah^Og77bJ87KB:on<_c7MbWE7K\kDlJ>^e6CMZ_k1^o[PYYoSo?m"
  356.     A$ = A$ + "[7RiMbY8^^?^d7ObJ8gl1=gc\S=fM:N]FKD8^E7gPL>ReaeccmhX?M8]KKLT"
  357.     A$ = A$ + "?0F>UMgShBc75]1`7SiWkn4GK=JVcG6?oMLmdfXO^gg7iH;:gY22kfV;]dGl"
  358.     A$ = A$ + "FaDk:[=AaecMMmoW_O=FgA9VHlW_LOAjD5GCkeJI5mA7SlXemindV:WcfE]I"
  359.     A$ = A$ + "^W`dbe6o8o=a<b4k0H#heJol`m6;]G_DL]i=k#Q[7Sa_5[OMjKFkmk;lg6iU"
  360.     A$ = A$ + "^?gg>8NFiJE\eMjQ6K?V=#WC78WR]9\EQ\[kWN[1N8[[\beJVUP[9nl31^Vm"
  361.     A$ = A$ + "#`Ga[hEN3GfWhn2<O7gKk:nFlLW^mAX36obca1;[SUE7GGP;hJI]c0n0o[Mn"
  362.     A$ = A$ + "BiLYRgoBNh7NHMj4ii7iANM3b^k26QSGKek2^VoonHL^nbobNIKnZojOUgLo"
  363.     A$ = A$ + "2oZO=OGKNSO??nmaOg?nS?lm`lO:al;a6>b08;d];On#R]ci2FJ3M]`e^GkX"
  364.     A$ = A$ + "n3`4JCFL\HIT91WS\[<bb_]g`KN#>6iFiURK#?a#MXDP[GZOhL<kJ>ln2OAO"
  365.     A$ = A$ + "93]7`SLOimYhfU`9VWn>j3Yo1LNilhVWgHC^SU?_Yd_lB\m3hGn2mG7M3`RY"
  366.     A$ = A$ + "MNBQLWfVAn3IKT?1k:GJSZDVggSmlFaBlOcal[dgTF`eL<>_^FJ\1hC=]1\a"
  367.     A$ = A$ + ">7>MKaenIjCY_5odYofJ8igiCnKO\5ijd1Tn=n]N^=o<oDodeafcHoZ<FI]A"
  368.     A$ = A$ + "Ocgm4?j=hITcPaF5fY43G:OFCFM:6iH\1>5^RoC^^[DjCj3<_F79fJd;djL0"
  369.     A$ = A$ + "?]7_4LjB]I?6FW[_mgM<mU9acMlA;]Og2>^T5G?T_i[SWfmaFTadn\nGGJ7N"
  370.     A$ = A$ + "<kcCa6c9f4W2SU[YYK3^VOK^eQ9d_\=him5S?2ONO^O13__CG^TcgCagSRYO"
  371.     A$ = A$ + "bkRS5;8^f??\N:Nei`HAQAN7LHV?Xil[\5LMR]afdHhJ`[SQ][]^?F3GCnX6"
  372.     A$ = A$ + ";nNeIMZ`fg#O2mooloR_gS]hEW>bThK[E=Ui7Zcl[l]FLM]abJH<N_jHB<=f"
  373.     A$ = A$ + "\[5SlDhj]Qe6E=<WeVCj;NVO_T]Vj;GgDZUc<\A[^ZS=F^Fo9adS_Bd6kDWP"
  374.     A$ = A$ + "8gJJofl>E^72Un_:V0iO7[E`dUnL?W=VM5KGjC>O?jZbk7Im]8gOa;acbUaF"
  375.     A$ = A$ + "G;G1WB\]9dmPn^NLfMZTZ=fg?`j0fEfY=?ITcH\C>FN^K1K_]bcUaKciZfmS"
  376.     A$ = A$ + "HmBVkcJLR[1W\=]eF^gG:fM\hBXo7M?6g?i8]CMJ<BkN^NMVHJ778mmcLTE3"
  377.     A$ = A$ + "OWbVC=6GK3SoWR1G;O#IL3d?labAfBm]H^O_5\\OORSYoWLVmiLEKUG[YK[C"
  378.     A$ = A$ + "MJ:BKLH;7lYSOJ93VPJ[2I?[Y<H[eZ]QiHiLhiTcOihN]VaFnno0OPKZ1>a>"
  379.     A$ = A$ + "N_Ue8gHhmUhGReFFbIMVI>a\MHK1Vn1n?_mgbC]iIOfnhGgYMWnhO\?n#nGa"
  380.     A$ = A$ + "nf?fCnS<8KUn6:N6OUCIbJLZG8e9`k;UW]F^Bl5GgmA^PiNTk?^ghj`e]Fk1"
  381.     A$ = A$ + "VkCJ_bN?hmQ<o?I`[Nl9GJ7fHWY_UnJJCXkYkcMW>64f?ZaPGn#_o?l`[8K2"
  382.     A$ = A$ + "GnknKlilg?f4]5dQlFO7o#3HMeM=66ELJ9k_EjOSeO:jBhjhmM73gY>MhYDO"
  383.     A$ = A$ + "5fB;InNHlG6VokYR^Z>eY>eY>eY>eY>eY>eY>eY>eY>eY>eY>eY>eY>eY>eY"
  384.     A$ = A$ + ">eY>eY>eY>=?moga%%L2": btemp$ = ""
  385.     HBCM$ = "MBO2L4G8G8AG>C<B2G8G8AG>DC2<G8G8>GEC<BA2>P8F8F8ECDC2"
  386.     FOR i& = 1 TO LEN(A$) STEP 4: B$ = MID$(A$, i&, 4)
  387.         IF INSTR(1, B$, "%") THEN
  388.             FOR C% = 1 TO LEN(B$): F$ = MID$(B$, C%, 1)
  389.                 IF F$ <> "%" THEN C$ = C$ + F$
  390.             NEXT: B$ = C$: END IF: FOR j = 1 TO LEN(B$)
  391.             IF MID$(B$, j, 1) = "#" THEN
  392.         MID$(B$, j) = "@": END IF: NEXT
  393.         FOR t% = LEN(B$) TO 1 STEP -1
  394.             B& = B& * 64 + ASC(MID$(B$, t%)) - 48
  395.             NEXT: X$ = "": FOR t% = 1 TO LEN(B$) - 1
  396.             X$ = X$ + CHR$(B& AND 255): B& = B& \ 256
  397.     NEXT: btemp$ = btemp$ + X$: NEXT
  398.     btemp$ = _INFLATE$(btemp$)
  399.     _MEMPUT m, m.OFFSET, btemp$: _MEMFREE m
  400.     HBC& = _COPYIMAGE(v&): _FREEIMAGE v&
  401.  
  402.  
Title: Re: Binary Search Method
Post by: bplus on June 19, 2020, 11:47:22 pm
Wow Dav, you can remember that? :)
Title: Re: Binary Search Method
Post by: Sanmayce on December 11, 2021, 03:02:34 am
Interesting topic, it's nice seeing fellow members sharing their views.

A decade ago, this very problem (finding a word among MANY words) caught my attention, as a result, I wrote Leprechaun - the x-gram ripper. It is simply the fastest (on Internet, at least) tool for ripping x-gram lists (wordlists in particular).

According to my experience, Binary-Search is not the answer, nor lookup table hashing (per se).

Indeed "a billion words" as @STxAxTIC said, makes the game brutal, in such scenarios the really good solutions scream, so far, my biggest rip was up to 600+ million distinct words when ripping the ten years span corpus of reddit JSON comments being 300+ billion words strong:

 


The answer is simply using the fastest 32bit (the hash output, not the instruction set) hasher combined with B-trees as a collision handler for each slot.
By the way, hate to see B-trees misnamed, the real name is Bayer-trees, not that crappy lore stating Broad/Boeing and so on.

When collisions grow then searching/dealing with them FAST, can be done nicely (but hard to code, it took me several months) with B-trees, MOSTLY because when you tailor your physical RAM memory footprint to be small, that is, the in-memory hash table(s) to be, say, with only 1<<20 slots then Dr. Bayer comes solving the DEGENERATION of Binary-Trees, or the nasty Linked Lists used in many Hash schemes.

My point, with Hash+Bayertrees duo, there is no limit of number of keys, nor DEGRADATION, on top of that, the duo can be tweaked/tailored according to your memory needs.

Oh, as for speed, on my laptop with i5-7200U, ripping rate (i.e. how many words are recognized/found per second) was around 12,000,000 lookups-per-second till the end of the file - STEADILY, when ripping the English Wikipedia XML dump ~70GB in size.
Title: Re: Binary Search Method
Post by: Sanmayce on December 12, 2021, 12:30:40 am
What I wanted was a hash method without collisions, but after a brief look at permutations, I punted on that idea.  Numbers way too big for words with over 20 characters.

"PUNTED", hah-hah, didn't know that word, don't leave your donkey behind in the mud Pete, as the Bulgarian proverb says. FAST Hashing without collisions is possible.
My Gumbotron (source code in my Masakari thread) is a 128bit hash (using either 128bit or 256bit instructions) suitable for such tasks, it has only one weakness - the range 1..15 of key lengths, therefore it has to be prefixed with some bytes, up to 4, anyway 20bytes or 160bit is kinda the standard. For keys with lengths 16 or bigger, for what it has been designed, it screams.

Didn't have enough time to make the complete benchmark, yet some results, all the 466544 lines have been hashed (with Gumbotron_YMM) in 8 miliseconds:

 


 


Code: QB64: [Select]
  1. C:\qb64>dir
  2.  
  3. 12/11/2021  08:42         5,329,510 466544 Word List.txt
  4. 12/12/2021  03:26        18,661,760 Gumbotron_466544 Word List.txt
  5.  
  6. C:\qb64>"QuickSortExternal_4+GB.distinct.txt"
  7.  
  8. C:\qb64>type "Gumbotron_466544 Word List.txt"|more
  9. 0x01-66D12A342B6CCE8B-76DD8BBCDC717EA2
  10. 0x04-4DEB4F7BD2C7B41A-EFA92545B3FDB9AF
  11. 0x02-156418E71C1387D0-C56894090FE9BCBB
  12. 0x08-47210BB6197940FF-77FDA06EA6D5CE3E
  13. 0x04-59B0EE4F929901C0-8166DE482670F770
  14. 0x08-EB52ABD995A0EE6E-495DC42CCA812D52
  15. 0x08-D1ED8F252634F052-A7564B142920ADEB
  16. 0x08-590505A51FC4A007-1742348B51C5C667
  17. 0x08-23748F08CBDDA10A-B90F700EDB31A73B
  18. ...
  19.  
  20. C:\qb64>"Sandokan_QuickSortExternal_Deduplicated_4+GB_64bit_Intel.exe" "Gumbotron_466544 Word List.txt" /fast /ascend 300
  21. Sandokan_QuickSortExternal_4+GB r.3+, written by Kaze, using Bill Durango's Quicksort source.
  22. Size of input file: 18,661,760
  23. Counting lines ...
  24. Lines encountered: 466,544
  25. Longest line (including CR if present): 39
  26. Allocated memory for pointers-to-lines in MB: 3
  27. Assigning pointers ...
  28. sizeof(int), sizeof(void*): 4, 8
  29. Trying to allocate memory for the file itself in MB: 17 ... OK! Get on with fast internal accesses.
  30. Uploading ...
  31. Sorting 466,544 Pointers ...
  32. Quicksort (Insertionsort for small blocks) commenced ...
  33. / RightEnd: 000,000,353,900; NumberOfSplittings: 0,000,053,177; Done: 100% ...
  34. NumberOfComparisons: 10,162,539
  35. The time to sort 466,544 items via Quicksort+Insertionsort was 938 clocks.
  36. Performance: 10,822,725 Comparisons_128B_long-Per-Second i.e 21,645,450 RandomReads_128B_long-Per-Second.
  37. Dumping the sorted data (Regime=2)...
  38. | Done 100% ...
  39. Dumped 466,544 lines.
  40. OK! Incoming and resultant file's sizes match.
  41. Dumping the sorted data [deduplicated] ...
  42. Dumped 466,544 distinct lines.
  43. Dump time: 181 clocks.
  44. Total time: 1,227 clocks.
  45. Performance: 15,184 bytes/clock.
  46. Done successfully.
  47.  
  48. C:\qb64>dir
  49.  
  50. 12/12/2021  03:27                 0 QuickSortExternal_4+GB.distinct.txt
  51. 12/12/2021  03:27        18,661,760 QuickSortExternal_4+GB.txt
  52.  
  53. C:\qb64>
  54.  

My old console tool Sandokan (something like 'sort') sorts the hashed lines of '466544 Word List.txt' and reports/dumps the duplicates in another file, just for visual purpose.
In fact, the hash is 16+1 bytes long, so whether the 466544 hashes are in external RAM or internal RAM changes nothing, Binary Search applied on fixed length keys (prefixed by the length of the key/hash) will be fast enough, considering the avoidance of using B-trees for maximum speed.
Those 400,000+ keys are not a serious benchmark, though, since binary logarithm of 466544 is 19 - it cannot stress properly the search, moreover the first run is not indicative since the keys are not cached yet, also 17*466544 < 8MB will be cached well or not depending on L3 cache of the CPU, more older CPUs are up to 4MB LLC, whereas latest Intel and AMD have 30MB, so it has to be considered if such a benchmark is to be ... modern enough.
Title: Re: Binary Search Method
Post by: Sanmayce on December 14, 2021, 07:43:25 pm
Having an English wordlist containing 15+ million distinct words, taken from 40 million titles, is worth benchmarking.

"As of October 2019, Google celebrated 15 years of Google Books and provided the number of scanned books as more than 40 million titles."
Source: https://en.wikipedia.org/wiki/Google_Books

The attached (ugh, the archive exceeded 20MB)  wordlist is ripped from "Google_Books_Ngram_Viewer_Exports_2020-02-17_eng-1-ngrams" corpus.
It is a must-have PLAYGROUND when comes to Binary-Search etudes.

How to make it, yourself:

Step #1: Download from http://storage.googleapis.com/books/ngrams/books/20200217/eng/eng-1-ngrams_exports.html 1-gram corpora;
Step #2: Decompress *.gz and "copy *24 Google_Books_Ngram_Viewer_Exports_2020-02-17_eng-1-ngrams.txt /b" them into one file;
Step #3: Trim all the "years" statistics i.e. crop only the first field (up to the first TAB character);
Step #4: Rip (make a wordlist) the trimmed file;
Step #5: Sort (for better compression) the ripped file;
Step #6: The resultant "Google_Books_Ngram_Viewer_Exports_2020-02-17_eng-1-ngrams.wordlist.txt" houses 15,074,900 distinct words with length 1..31 characters.

The whole package (including "Google_Books_Ngram_Viewer_Exports_2020-02-17_eng-1-ngrams.wordlist.txt"):
www.sanmayce.com/Google_Books_Ngram_Viewer_datasets_v3_WORDLIST.7z (32.0 MB)

 


Maybe, a somewhat decent Binary-Search benchmark would be to "spell-check" the KJV Bible (~4MB text) using Google's wordlist, it amounts to Binary-Searching 793,877 words into 15,074,900 wordlist ...
Title: Re: Binary Search Method
Post by: Sanmayce on December 18, 2021, 02:47:28 am
Finally, we have a decent benchmark utilizing the Binary-Search in a speedy manner, it is a good start for all kind of experiments since it is creating as a result the .HTM counterpart (with all unfamiliar words in BOLD, all OED distinct words are used as a spell-check wordlist) of our .TXT file.

 


It throws all the words of KJV Bible at OED 2nd edition, which is 700,000+ words against the 1,000,000+ OED words:

 


The 64bit binary and the QB64 source code are attached, also the OED wordlist.

Spell-checking KJV Bible under 2 seconds, can we do better?