QB64.org Forum

Active Forums => Programs => Topic started by: Zeppelin on October 07, 2018, 03:56:15 am

Title: WordCracker
Post by: Zeppelin on October 07, 2018, 03:56:15 am
Hey,
I run into a issue with my program.
Im trying to make a program that will, when given a string of 9 letters and 1 key letter to print out all the possible words using these letters.
For example:
9 Letters: ABCDEFGHI
Key Letter: A
PRINTS:
BAD
CAB
DEAF
etc....

I am using a .txt database full of words from the Oxford dictionary and each time I run the program nothing is output. I cant find the issue.

Thanks,
Zeppelin

Ps. The program and wordlist are attached below.
Title: Re: WordCracker
Post by: SMcNeill on October 07, 2018, 07:37:00 am
http://qb64.freeforums.net/thread/42/scrabble-word-maker -- Sounds like what you're wanting is basically the same thing I've did here.  Give it a look and see if it helps.
Title: Re: WordCracker
Post by: bplus on October 07, 2018, 10:33:25 am
Code: QB64: [Select]
  1. 'SPLIT UP LETTERS
  2. FOR n = 1 TO 9
  3.     ltr$(n) = MID$(in$, n, 1)
  4. PRINT LEN(ltr$)  '>>>>>>>>>>>> 0 !!!
  5.  

Code: QB64: [Select]
  1.             FOR x = 1 TO LEN(in$)  '<<<<<<<<<<<<<<<<<<<<<<< change to this?
  2.                 IF ltr$(x) = templtr$ THEN
  3.                     ltr$(x) = ""
  4.                     count = count + 1
  5.                 END IF
  6.             NEXT x
  7.  

OH!!! This is screwing you up too!

Code: QB64: [Select]
  1.     FOR n = 1 TO 9
  2.         ltr$(n) = MID$(in$, n, 1)
  3.     NEXT n
You are using n for the word index from the file, and then changing n here when resetting ltr$() array!!!
Title: Re: WordCracker
Post by: bplus on October 07, 2018, 10:47:30 am
OK now get stuff printed in reasonable time!
Code: QB64: [Select]
  1.  
  2. DIM word$(84100)
  3. DIM SHARED ltr$(9)  '<<<<<<<<<<< shared to debug
  4.  
  5. 'SET KEY AND LETTERS
  6. key$ = "a"
  7. in$ = "abcdefghi"
  8. lenin = LEN(in$) '<<<<<<<<<<< for faster processing
  9.  
  10. 'SPLIT UP LETTERS
  11. FOR n = 1 TO 9
  12.     ltr$(n) = MID$(in$, n, 1)
  13.  
  14. 'LOAD ALL WORDS FROM .TXT FILE
  15. PRINT "LOADING...."
  16. OPEN "WordList.txt" FOR INPUT AS #1  '<<<<<<<<< moved to more appropriate place
  17.     filecount = filecount + 1
  18.     INPUT #1, line$
  19.     word$(filecount) = line$
  20.  
  21.  
  22. 'RUN THROUGH WORDS
  23. FOR n = 1 TO filecount
  24.     findkey = INSTR(word$(n), key$)
  25.  
  26.     IF findkey THEN
  27.         FOR i = 1 TO LEN(word$(n))
  28.             templtr$ = MID$(word$(n), i, 1)
  29.             FOR x = 1 TO lenin '<<<<<<<<<<<<<<<<< main fix #1
  30.                 IF ltr$(x) = templtr$ THEN
  31.                     ltr$(x) = ""
  32.                     count = count + 1
  33.                 END IF
  34.                 'debugging
  35.                 'PRINT "Update: file word = "; word$(n); " and here is current letters crossed off: "; letters$
  36.                 'INPUT "OK press enter... "; wate$
  37.             NEXT x
  38.         NEXT i
  39.  
  40.         IF count = LEN(word$(n)) THEN
  41.             PRINT word$(n)
  42.         END IF
  43.     END IF
  44.  
  45.     FOR m = 1 TO 9  'main fix #2 n's  to m's
  46.         ltr$(m) = MID$(in$, m, 1)
  47.     NEXT m
  48.     count = 0
  49.  
  50. PRINT "DONE..."
  51.  
  52. 'for debugging
  53. FUNCTION letters$ ()
  54.     FOR i = 1 TO 9
  55.         b$ = b$ + ltr$(i)
  56.     NEXT
  57.     letters$ = "*" + b$ + "*"
  58.  

DONE? Looks like the logic was OK, just the variable assignments needed fixing.
Title: Re: WordCracker
Post by: codeguy on October 07, 2018, 02:09:15 pm
Gives you every permutation of letters, numbers, etc. If parray() is ordered, so is the output.
Code: QB64: [Select]
  1. WIDTH 80, 43
  2. DIM parray(0 TO 3) AS LONG
  3. FOR i = 0 TO UBOUND(parray)
  4.     parray(i) = i
  5. PRINT "permutations"
  6. Permute parray(), 0, UBOUND(parray), np
  7.     x$ = INKEY$
  8. LOOP UNTIL x$ > ""
  9.  
  10. SUB DisplayResults (PArray() AS LONG, start, finish, np AS DOUBLE)
  11. PRINT USING "#,###,###,###,###"; np;
  12. FOR i = LBOUND(parray) TO UBOUND(parray)
  13.     PRINT PArray(i);
  14.  
  15. SUB Rotate (parray() AS LONG, Start AS LONG, finish AS LONG)
  16. ts = parray(Start)
  17. FOR i = Start TO finish - 1
  18.     SWAP parray(i), parray(i + 1)
  19. parray(finish) = ts
  20.  
  21. SUB Permute (parray() AS LONG, start AS LONG, finish AS LONG, np AS DOUBLE)
  22. np = np + 1
  23. DisplayResults parray(), LBOUND(parray), UBOUND(parray), np
  24. IF start < finish THEN
  25.     DIM i AS LONG
  26.     DIM j AS LONG
  27.     FOR i = finish - 1 TO start STEP -1
  28.         FOR j = i + 1 TO finish
  29.             SWAP parray(i), parray(j)
  30.             Permute parray(), i + 1, finish, np
  31.         NEXT
  32.         Rotate parray(), i, finish
  33.     NEXT
  34.  
Title: Re: WordCracker
Post by: bplus on October 07, 2018, 02:52:59 pm
Hi codeguy,

I thought about permutations and then thought nah! won't work with real words at varying lengths.

But I could be wrong, wouldn't be first time.

Wanna race? You modify your code for checking for real words and I will see if I can optimize Zeppellin's code some more.... post in 24 hours?


Title: Re: WordCracker
Post by: SMcNeill on October 07, 2018, 03:32:28 pm
Hi codeguy,

I thought about permutations and then thought nah! won't work with real words at varying lengths.

But I could be wrong, wouldn't be first time.

Wanna race? You modify your code for checking for real words and I will see if I can optimize Zeppellin's code some more.... post in 24 hours?

Here's how I'd go, I think:

First, reduce the word list to exclude any words > 9 digits.  No need to search for impossible combinations.
Then count letters.  Save these in an array WordLetters(1 TO WordCount,1 TO 26).

Then count letters in the target word.
Compare.  If target count > word count then it's a match!

************************
The word list is going to be rather limited (less than 50k words I'd imagine), and most searches will terminate rather quickly.  (need an "A", don't have any?  Quit the search at this point.) 

I really don't think you'd need to worry about optimizing for speed any more than that, to be honest.
Title: Re: WordCracker
Post by: bplus on October 07, 2018, 03:44:47 pm
Hi codeguy,

I thought about permutations and then thought nah! won't work with real words at varying lengths.

But I could be wrong, wouldn't be first time.

Wanna race? You modify your code for checking for real words and I will see if I can optimize Zeppellin's code some more.... post in 24 hours?

Here's how I'd go, I think:

First, reduce the word list to exclude any words > 9 digits.  No need to search for impossible combinations.
Then count letters.  Save these in an array WordLetters(1 TO WordCount,1 TO 26).

Then count letters in the target word.
Compare.  If target count > word count then it's a match!

************************
The word list is going to be rather limited (less than 50k words I'd imagine), and most searches will terminate rather quickly.  (need an "A", don't have any?  Quit the search at this point.) 

I really don't think you'd need to worry about optimizing for speed any more than that, to be honest.

Hmm... I think you are saying permutations is slower too?

I am thinking this part:
Code: QB64: [Select]
  1. 'SET KEY AND LETTERS
  2. key$ = "a"
  3. in$ = "abcdefghi"
  4. lenin = LEN(in$) '<<<<<<<<<<< for faster processing
  5.  

key$ and in$ might be input into the program at run time, so the letters might not be nine and in$ might be real words too not segments of the alphabet. That's how I would use this code for word games or word game solving. And the real words might have 2 or 3 of the same letters. I was planning on optimizing for such events.
Title: Re: WordCracker
Post by: SMcNeill on October 07, 2018, 04:28:46 pm
Here's an easy fix for you to optimize your code and run it faster.  (An increase of about 15x faster on my machine.) 

I'll post 2 routines here so you can run them and compare for speed differences:

YOURS (modified to loop and run for a set amount of time -- 5 seconds in this case):
Code: QB64: [Select]
  1. DIM word$(84100)
  2. DIM SHARED ltr$(9) '<<<<<<<<<<< shared to debug
  3.  
  4. 'SET KEY AND LETTERS
  5. key$ = "a"
  6. in$ = "abcdefghi"
  7. lenin = LEN(in$) '<<<<<<<<<<< for faster processing
  8.  
  9. 'SPLIT UP LETTERS
  10. FOR n = 1 TO 9
  11.     ltr$(n) = MID$(in$, n, 1)
  12.  
  13.  
  14.  
  15. 'LOAD ALL WORDS FROM .TXT FILE
  16. PRINT "LOADING...."
  17. OPEN "WordList.txt" FOR INPUT AS #1 '<<<<<<<<< moved to more appropriate place
  18. filecount = 0
  19.     filecount = filecount + 1
  20.     INPUT #1, line$
  21.     word$(filecount) = line$
  22.  
  23.  
  24. t# = TIMER
  25.  
  26. timelimit = 5
  27. DO UNTIL TIMER > t# + timelimit
  28.     CLS
  29.     loopsran = loopsran + 1
  30.  
  31.  
  32.  
  33.     'RUN THROUGH WORDS
  34.     FOR n = 1 TO filecount
  35.         findkey = INSTR(word$(n), key$)
  36.  
  37.         IF findkey THEN
  38.             FOR i = 1 TO LEN(word$(n))
  39.                 templtr$ = MID$(word$(n), i, 1)
  40.                 FOR x = 1 TO lenin '<<<<<<<<<<<<<<<<< main fix #1
  41.                     IF ltr$(x) = templtr$ THEN
  42.                         ltr$(x) = ""
  43.                         count = count + 1
  44.                     END IF
  45.                     'debugging
  46.                     'PRINT "Update: file word = "; word$(n); " and here is current letters crossed off: "; letters$
  47.                     'INPUT "OK press enter... "; wate$
  48.                 NEXT x
  49.             NEXT i
  50.  
  51.             IF count = LEN(word$(n)) THEN
  52.                 PRINT word$(n)
  53.             END IF
  54.         END IF
  55.  
  56.         FOR m = 1 TO 9 'main fix #2 n's  to m's
  57.             ltr$(m) = MID$(in$, m, 1)
  58.         NEXT m
  59.         count = 0
  60.     NEXT n
  61.  
  62.     PRINT "DONE..."
  63. PRINT USING "###,###,###,###,### loops ran in ##.# seconds"; loopsran, timelimit
  64.  
  65.  
  66. 'for debugging
  67. FUNCTION letters$ ()
  68.     FOR i = 1 TO 9
  69.         b$ = b$ + ltr$(i)
  70.     NEXT
  71.     letters$ = "*" + b$ + "*"

Modified:
Code: QB64: [Select]
  1. DEFLNG A-Z
  2. DIM word$(84100)
  3. DIM SHARED ltr(9) '<<<<<<<<<<< shared to debug
  4.  
  5. 'SET KEY AND LETTERS
  6. key$ = "a"
  7. in$ = "abcdefghi"
  8. lenin = LEN(in$) '<<<<<<<<<<< for faster processing
  9.  
  10. 'SPLIT UP LETTERS
  11. FOR n = 1 TO 9
  12.     ltr(n) = ASC(in$, n)
  13.  
  14.  
  15.  
  16.  
  17.  
  18. 'LOAD ALL WORDS FROM .TXT FILE
  19. PRINT "LOADING...."
  20. OPEN "WordList.txt" FOR BINARY AS #1 '<<<<<<<<< moved to more appropriate place
  21. filecount = 0
  22.     LINE INPUT #1, line$
  23.     IF LEN(line$) < 10 THEN
  24.         filecount = filecount + 1
  25.         word$(filecount) = line$
  26.     END IF
  27.  
  28.  
  29. t# = TIMER
  30. timelimit = 5
  31. DO UNTIL TIMER > t# + timelimit
  32.     CLS
  33.     loopsran = loopsran + 1
  34.  
  35.  
  36.     'RUN THROUGH WORDS
  37.     FOR n = 1 TO filecount
  38.         findkey = INSTR(word$(n), key$)
  39.  
  40.         IF findkey THEN
  41.             FOR i = 1 TO LEN(word$(n))
  42.                 templtr = ASC(word$(n), i)
  43.                 FOR x = 1 TO lenin '<<<<<<<<<<<<<<<<< main fix #1
  44.                     IF ltr(x) = templtr THEN
  45.                         ltr(x) = 0
  46.                         count = count + 1
  47.                     END IF
  48.                 NEXT x
  49.                 IF count <> i GOTO skipmore
  50.             NEXT i
  51.  
  52.             IF count = LEN(word$(n)) THEN
  53.                 PRINT word$(n),
  54.             END IF
  55.         END IF
  56.  
  57.         skipmore:
  58.         FOR m = 1 TO 9 'main fix #2 n's  to m's
  59.             ltr(m) = ASC(in$, m)
  60.         NEXT m
  61.         count = 0
  62.     NEXT n
  63.  
  64.     PRINT "DONE..."
  65. PRINT USING "###,###,###,###,### loops ran in ##.# seconds"; loopsran, timelimit
  66.  
  67. 'for debugging
  68. FUNCTION letters$ ()
  69.     FOR i = 1 TO 9
  70.         b$ = b$ + ltr$(i)
  71.     NEXT
  72.     letters$ = "*" + b$ + "*"

Yours will run 22 times in 5 seconds, mine 324 times...

The changes?

#1) STRINGS are gone.  Why do we need them?  Especially, WHY DO WE NEED MID$??

When going for speed routines, ASC outperforms MID$(x$,y,1) every time, hands down!  This is a major performance boost.

#2) The word list is limited to begin with.

    IF LEN(line$) < 10 THEN
        filecount = filecount + 1
        word$(filecount) = line$
    END IF

You're not going to find 10 letter word matches with only 9 letter words.

#3) (Not timed, but a huge speed boost): Changed file from INPUT to BINARY and changed INPUT # to LINE INPUT#.  This makes loading the word list a whole heck of a lot faster.  Even faster would be to load it all at once and then parse the words out of it, but who wants to go through the trouble for all of the few microseconds we'd save in this case?

Simple little things, but the affect the performance like crazy.
Title: Re: WordCracker
Post by: SMcNeill on October 07, 2018, 05:01:34 pm
And, if we create a letter index for our words (since we're going to do a limit by key$), we can almost double the performance.  (Probably more for certain letters which aren't going to be indexed as much as "a" is, in this case.)

Code: QB64: [Select]
  1. DEFLNG A-Z
  2. DIM word$(84100)
  3. DIM SHARED ltr(9) '<<<<<<<<<<< shared to debug
  4. DIM backup(9)
  5. DIM keycount(97 TO 122, 50000) 'letter/index
  6.  
  7. 'SET KEY AND LETTERS
  8. key$ = "a"
  9. in$ = "abcdefghi"
  10. lenin = LEN(in$) '<<<<<<<<<<< for faster processing
  11.  
  12. 'SPLIT UP LETTERS
  13. FOR n = 1 TO 9
  14.     ltr(n) = ASC(in$, n)
  15.     backup(n) = ltr(n)
  16.  
  17.  
  18.  
  19.  
  20.  
  21. 'LOAD ALL WORDS FROM .TXT FILE
  22. PRINT "LOADING...."
  23. OPEN "WordList.txt" FOR BINARY AS #1 '<<<<<<<<< moved to more appropriate place
  24. filecount = 0
  25.     LINE INPUT #1, line$
  26.     IF LEN(line$) < 10 THEN
  27.         filecount = filecount + 1
  28.         word$(filecount) = line$
  29.         FOR i = 97 TO 122
  30.             IF INSTR(line$, CHR$(i)) THEN
  31.                 keycount(i, 0) = keycount(i, 0) + 1 'record 0 tells us how many there are for that letter
  32.                 keycount(i, keycount(i, 0)) = filecount 'add the word number to the proper index
  33.             END IF
  34.         NEXT
  35.     END IF
  36.  
  37.  
  38. t# = TIMER
  39. timelimit = 5
  40. DO UNTIL TIMER > t# + timelimit
  41.     CLS
  42.     loopsran = loopsran + 1
  43.  
  44.  
  45.     'RUN THROUGH WORDS
  46.     findkey = ASC(key$)
  47.     FOR n = 1 TO keycount(findkey, 0)
  48.         w$ = word$(keycount(findkey, n))
  49.         l = LEN(w$)
  50.         FOR i = 1 TO l
  51.             templtr = ASC(w$, i)
  52.             FOR x = 1 TO lenin '<<<<<<<<<<<<<<<<< main fix #1
  53.                 IF ltr(x) = templtr THEN
  54.                     ltr(x) = 0
  55.                     count = count + 1
  56.                 END IF
  57.             NEXT x
  58.             IF count <> i GOTO skipmore
  59.         NEXT i
  60.  
  61.         IF count = l THEN PRINT w$,
  62.  
  63.         skipmore:
  64.         FOR m = 1 TO 9 'main fix #2 n's  to m's
  65.             ltr(m) = backup(m)
  66.         NEXT m
  67.         count = 0
  68.     NEXT n
  69.  
  70.     PRINT "DONE..."
  71. PRINT USING "###,###,###,###,### loops ran in ##.# seconds"; loopsran, timelimit
  72.  
  73. 'for debugging
  74. FUNCTION letters$ ()
  75.     FOR i = 1 TO 9
  76.         b$ = b$ + ltr$(i)
  77.     NEXT
  78.     letters$ = "*" + b$ + "*"
  79.  

No need to check each and every word for a letter, if we already have built an index to tell us which words contain those letters.  ;)

(Current run time is 488 loops in 5 seconds on my PC, if you're curious about the actual number to compare improvement.)
Title: Re: WordCracker
Post by: bplus on October 07, 2018, 05:01:54 pm
Here's what I had in mind:
Code: QB64: [Select]
  1. ' WordCrack mod 1.bas B+ 2018-10-07
  2.  
  3. INPUT "Enter a string to build words from "; in$
  4. 'load words
  5. OPEN "WordList.txt" FOR BINARY AS #1
  6. gulp& = LOF(1)
  7. buff$ = STRING$(gulp&, " ")
  8. GET #1, , buff$
  9. REDIM word$(0)
  10. Split buff$, CHR$(10), word$()
  11. filecount& = UBOUND(word$)
  12. 'RUN THROUGH WORDS
  13. FOR n& = 0 TO filecount&
  14.     c$ = in$
  15.     OK% = -1
  16.     FOR i% = 1 TO LEN(word$(n&))
  17.         p% = INSTR(c$, MID$(word$(n&), i%, 1))
  18.         IF p% = 0 THEN
  19.             OK% = 0: EXIT FOR
  20.         ELSE
  21.             MID$(c$, p%, 1) = "+"
  22.         END IF
  23.     NEXT
  24.     IF OK% THEN PRINT word$(n&); ", ";
  25. PRINT "DONE..."
  26.  
  27.  
  28. SUB Split (mystr AS STRING, delim AS STRING, arr() AS STRING)
  29.     ' bplus modifications of Galleon fix of Bulrush Split reply #13
  30.     ' http://www.[abandoned, outdated and now likely malicious qb64 dot net website - don’t go there]/forum/index.php?topic=1612.0
  31.     ' this sub further developed and tested here: \test\Strings\Split test.bas
  32.     DIM copy AS STRING, p AS LONG, curpos AS LONG, arrpos AS LONG, lc AS LONG, dpos AS LONG
  33.     copy = mystr 'make copy since we are messing with mystr
  34.     'special case if delim is space, probably want to remove all excess space
  35.     IF delim = " " THEN
  36.         copy = RTRIM$(LTRIM$(copy))
  37.         p = INSTR(copy, "  ")
  38.         WHILE p > 0
  39.             copy = MID$(copy, 1, p - 1) + MID$(copy, p + 1)
  40.             p = INSTR(copy, "  ")
  41.         WEND
  42.     END IF
  43.     curpos = 1
  44.     arrpos = 0
  45.     lc = LEN(copy)
  46.     dpos = INSTR(curpos, copy, delim)
  47.     DO UNTIL dpos = 0
  48.         arr(arrpos) = MID$(copy, curpos, dpos - curpos)
  49.         arrpos = arrpos + 1
  50.         REDIM _PRESERVE arr(arrpos + 1) AS STRING
  51.         curpos = dpos + LEN(delim)
  52.         dpos = INSTR(curpos, copy, delim)
  53.     LOOP
  54.     arr(arrpos) = MID$(copy, curpos)
  55.     REDIM _PRESERVE arr(arrpos) AS STRING
  56.  

Looks pretty simple to me.

Yes, Binary load most significant speed improvement! :)
Title: Re: WordCracker
Post by: SMcNeill on October 07, 2018, 05:26:23 pm
And another small modification of the original gives us another decent speed boost:

Code: QB64: [Select]
  1. DEFLNG A-Z
  2. DIM word$(84100)
  3. DIM keys
  4. DIM SHARED ltr(1 TO 9) '<<<<<<<<<<< shared to debug
  5. DIM backup(1 TO 9)
  6. DIM keycount(97 TO 122, 50000) 'letter/index
  7. DIM m1 AS _MEM, m2 AS _MEM 'just for quick array restoration
  8. m1 = _MEM(ltr()): m2 = _MEM(backup())
  9.  
  10. 'SET KEY AND LETTERS
  11. keys = ASC("a")
  12. in$ = "abcdefghi"
  13. lenin = LEN(in$) '<<<<<<<<<<< for faster processing
  14.  
  15. 'SPLIT UP LETTERS
  16. FOR n = 1 TO 9
  17.     ltr(n) = ASC(in$, n)
  18. _MEMPUT m2, m2.OFFSET, ltr()
  19.  
  20.  
  21. 'LOAD ALL WORDS FROM .TXT FILE
  22. PRINT "LOADING...."
  23. OPEN "WordList.txt" FOR BINARY AS #1 '<<<<<<<<< moved to more appropriate place
  24. filecount = 0
  25.     LINE INPUT #1, line$
  26.     IF LEN(line$) < 10 THEN
  27.         filecount = filecount + 1
  28.         word$(filecount) = line$
  29.         FOR i = 97 TO 122
  30.             IF INSTR(line$, CHR$(i)) THEN
  31.                 keycount(i, 0) = keycount(i, 0) + 1 'record 0 tells us how many there are for that letter
  32.                 keycount(i, keycount(i, 0)) = filecount 'add the word number to the proper index
  33.             END IF
  34.         NEXT
  35.     END IF
  36.  
  37.  
  38. t# = TIMER
  39. timelimit = 5
  40.  
  41.  
  42. DO UNTIL TIMER > t# + timelimit
  43.     CLS
  44.     loopsran = loopsran + 1
  45.  
  46.  
  47.     'RUN THROUGH WORDS
  48.     FOR n = 1 TO keycount(keys, 0)
  49.         w$ = word$(keycount(keys, n))
  50.         l = LEN(w$)
  51.         FOR i = 1 TO l
  52.             templtr = ASC(w$, i)
  53.             FOR x = 1 TO lenin '<<<<<<<<<<<<<<<<< main fix #1
  54.                 IF ltr(x) = templtr THEN
  55.                     ltr(x) = 0
  56.                     count = count + 1
  57.                     EXIT FOR
  58.                 END IF
  59.             NEXT x
  60.             IF count <> i GOTO skipmore
  61.         NEXT i
  62.  
  63.         PRINT w$,
  64.  
  65.         skipmore:
  66.         $CHECKING:OFF
  67.         _MEMPUT m1, m1.OFFSET, backup()
  68.         $CHECKING:ON
  69.         count = 0
  70.     NEXT n
  71.  
  72.     PRINT "DONE..."
  73. PRINT USING "###,###,###,###,### loops ran in ##.# seconds"; loopsran, timelimit
  74.  
  75. 'for debugging
  76. FUNCTION letters$ ()
  77.     FOR i = 1 TO 9
  78.         b$ = b$ + ltr$(i)
  79.     NEXT
  80.     letters$ = "*" + b$ + "*"
  81.  

Instead of a FOR...LOOP to reset the search list, I simply restored it with a backup array with _MEM.

From:

        FOR m = 1 TO 9 'main fix #2 n's  to m's
            ltr$(m) = MID$(in$, m, 1)
        NEXT m

To:

        $CHECKING:OFF
        _MEMPUT m1, m1.OFFSET, backup()
        $CHECKING:ON

*****************
*****************

I'm now getting ~900 runs in a short 5 second period.  I'd call that fast enough for just about anything I'd need to use it for.  (And a nice improvement from the original 22 runs in 5 seconds.)  ;)
Title: Re: WordCracker
Post by: bplus on October 07, 2018, 05:40:28 pm
I am getting 363 cycles for abcdefghi in 5 secs with Windows 10 Intel Core i5 processor without any preprocessing of the word$() array.
Title: Re: WordCracker
Post by: bplus on October 07, 2018, 07:24:40 pm
Steve your last code improvement is running 1067 or so loops on my machine.

Yeah OK, checking word length first does make sense and adds another 100 loops in 5 secs:
Code: QB64: [Select]
  1. ' WordCrack mod 1.bas B+ 2018-10-07
  2.  
  3. ' now with timer mod
  4.  
  5. 'INPUT "Enter a string to build words from "; in$
  6. in$ = "abcdefghi"
  7. in$ = LCASE$(in$)
  8. lenin = LEN(in$)
  9.  
  10. 'load words
  11. OPEN "WordList.txt" FOR BINARY AS #1
  12. gulp& = LOF(1)
  13. buff$ = STRING$(gulp&, " ")
  14. GET #1, , buff$
  15. REDIM word$(0)
  16. Split buff$, CHR$(10), word$()
  17. filecount& = UBOUND(word$)
  18. start! = TIMER
  19. WHILE TIMER - start! < 5
  20.     'RUN THROUGH WORDS
  21.     FOR n& = 0 TO filecount&
  22.         IF LEN(word$(n&)) <= lenin THEN
  23.             c$ = in$
  24.             OK% = -1
  25.             FOR i% = 1 TO LEN(word$(n&))
  26.                 p% = INSTR(c$, MID$(word$(n&), i%, 1))
  27.                 IF p% = 0 THEN
  28.                     OK% = 0: EXIT FOR
  29.                 ELSE
  30.                     MID$(c$, p%, 1) = "+"
  31.                 END IF
  32.             NEXT
  33.             IF OK% THEN PRINT word$(n&); ", ";
  34.         END IF
  35.     NEXT
  36.     counter = counter + 1
  37. PRINT "Loop count in 5 secs ="; counter
  38.  
  39. SUB Split (mystr AS STRING, delim AS STRING, arr() AS STRING)
  40.     ' bplus modifications of Galleon fix of Bulrush Split reply #13
  41.     ' http://www.[abandoned, outdated and now likely malicious qb64 dot net website - don’t go there]/forum/index.php?topic=1612.0
  42.     ' this sub further developed and tested here: \test\Strings\Split test.bas
  43.     DIM copy AS STRING, p AS LONG, curpos AS LONG, arrpos AS LONG, lc AS LONG, dpos AS LONG
  44.     copy = mystr 'make copy since we are messing with mystr
  45.     'special case if delim is space, probably want to remove all excess space
  46.     IF delim = " " THEN
  47.         copy = RTRIM$(LTRIM$(copy))
  48.         p = INSTR(copy, "  ")
  49.         WHILE p > 0
  50.             copy = MID$(copy, 1, p - 1) + MID$(copy, p + 1)
  51.             p = INSTR(copy, "  ")
  52.         WEND
  53.     END IF
  54.     curpos = 1
  55.     arrpos = 0
  56.     lc = LEN(copy)
  57.     dpos = INSTR(curpos, copy, delim)
  58.     DO UNTIL dpos = 0
  59.         arr(arrpos) = MID$(copy, curpos, dpos - curpos)
  60.         arrpos = arrpos + 1
  61.         REDIM _PRESERVE arr(arrpos + 1) AS STRING
  62.         curpos = dpos + LEN(delim)
  63.         dpos = INSTR(curpos, copy, delim)
  64.     LOOP
  65.     arr(arrpos) = MID$(copy, curpos)
  66.     REDIM _PRESERVE arr(arrpos) AS STRING
  67.  
Title: Re: WordCracker
Post by: codeguy on October 08, 2018, 12:23:37 am
generates strings and retrieves matches for permutations of letters creating words up to 8 characters long. in order too. On my humble machine, (39/64)s. Because permutations are generated in lexical order AND the wordlist is in lexical order (both ascending), this becomes nothing more than a simple merge/find algorithm, the same used for batch database updates. Quick? Judge for yourself. found 58 matches in 322560 variable-length permutations. I am uncertain how many loops/second this code executes, but I'm gonna say it's 322560/(39/64), roughly 529329 trials/second on my humble 2.16GHz machine. On my machine, this will also execute about 8 times, perhaps a wee bit more. I am uncertain how fast Steve's CPU is, but mine is no speed demon nor is it super slow. Mine is roughly 240,000 trials/GHz.
Code: QB64: [Select]
  1. WIDTH 80, 43
  2. DIM parray(0 TO 7) AS LONG '* checks words up to 8 characters long
  3. theword$ = "sequoias"
  4. FOR i = 0 TO UBOUND(parray)
  5.     parray(i) = ASC(theword$, i + 1)
  6. s& = LBOUND(parray)
  7. h& = UBOUND(parray)
  8.     an& = s& + 1
  9.     FOR q& = an& TO h&
  10.         IF parray(q&) < parray(s&) THEN
  11.             SWAP parray(q&), parray(s&)
  12.         END IF
  13.     NEXT
  14.     s& = an&
  15. LOOP WHILE s& <= h&
  16. Wlist% = FREEFILE
  17. OPEN ".\wordlist.txt" FOR BINARY AS Wlist%
  18. PRINT LOF(Wlist%)
  19. chunk$ = INPUT$(LOF(Wlist%), Wlist%)
  20. CLOSE Wlist%
  21. REDIM words$(0 TO 999999)
  22. Wct& = 0
  23. FOR u& = 1 TO LEN(chunk$)
  24.     IF ASC(chunk$, u&) = 10 OR u& > LEN(chunk$) THEN
  25.         IF LEN(w$) <= 8 THEN
  26.             words$(Wct&) = w$
  27.             PRINT w$; Wct&
  28.             Wct& = Wct& + 1
  29.         END IF
  30.         w$ = ""
  31.     ELSE
  32.         w$ = w$ + MID$(chunk$, u&, 1)
  33.     END IF
  34. chunk$ = ""
  35. WordsIndex& = 0
  36. nmatches& = 0
  37. '_DELAY 60
  38. t! = TIMER(.001)
  39. PRINT "permutations"
  40. Permute parray(), 0, UBOUND(parray), np#, words$(), WordIndex&, Wct&, nmatches&, npermtrials#
  41. f! = TIMER(.001)
  42. PRINT f! - t!; nmatches&; np#; npermtrials#
  43.     x$ = INKEY$
  44. LOOP UNTIL x$ > ""
  45.  
  46. SUB DisplayResults (PArray() AS LONG, start, finish, np AS DOUBLE)
  47.     DIM i AS LONG
  48.     PRINT USING "#,###,###,###,###"; np;
  49.     FOR i = LBOUND(parray) TO UBOUND(parray)
  50.         PRINT PArray(i);
  51.     NEXT
  52.     PRINT
  53.  
  54. SUB Rotate (parray() AS LONG, Start AS LONG, finish AS LONG)
  55.     DIM ts AS LONG
  56.     ts = parray(Start)
  57.     FOR i = Start TO finish - 1
  58.         SWAP parray(i), parray(i + 1)
  59.     NEXT
  60.     parray(finish) = ts
  61.  
  62. SUB Permute (parray() AS LONG, start AS LONG, finish AS LONG, np AS DOUBLE, words$(), index&, wct&, matchcount&, NPermsTried#)
  63.     np = np + 1
  64.     IF index& < wct& THEN
  65.         FOR a& = 0 TO UBOUND(parray)
  66.             v$ = array2word$(parray(), LBOUND(parray), LBOUND(parray) + a&)
  67.             DO
  68.                 IF words$(index&) < v$ THEN
  69.                     index& = index& + 1
  70.                 ELSE
  71.                     NPermsTried# = NPermsTried# + 1
  72.                     IF words$(index&) = v$ THEN
  73.                         matchcount& = matchcount& + 1
  74.                         PRINT v$; ":match:"; words$(index&); index&; matchcount&
  75.                     END IF
  76.                     EXIT DO
  77.                 END IF
  78.             LOOP
  79.         NEXT
  80.     ELSE
  81.         EXIT SUB
  82.     END IF
  83.     IF start < finish THEN
  84.         DIM i AS LONG
  85.         DIM j AS LONG
  86.         FOR i = finish - 1 TO start STEP -1
  87.             FOR j = i + 1 TO finish
  88.                 SWAP parray(i), parray(j)
  89.                 Permute parray(), i + 1, finish, np, words$(), index&, wct&, matchcount&, NPermsTried#
  90.             NEXT
  91.             Rotate parray(), i, finish
  92.         NEXT
  93.     END IF
  94.  
  95. FUNCTION array2word$ (parray() AS LONG, start AS LONG, finish AS LONG)
  96.     m$ = SPACE$(finish - start + 1)
  97.     POSITION& = 1
  98.     FOR z& = start TO finish
  99.         MID$(m$, POSITION&) = CHR$(parray(z&))
  100.         POSITION& = POSITION& + 1
  101.     NEXT
  102.     array2word$ = m$
  103.  
Title: Re: WordCracker
Post by: Zeppelin on October 08, 2018, 01:11:38 am
Wow. Thanks everyone for their responses. Thanks for the performance tips (It really helps. I still learning).
Its nice to see everyone trying to out-do each others programs.

I have to say SMcNeill..... what do you do with your life.

Just joking man. Unbelievably fast program by the way.

Thanks everyone,
Zeppelin
Title: Re: WordCracker
Post by: bplus on October 08, 2018, 10:10:05 am
Made some more improvements like getting rid of blank word and preprocessed file according to number of letters input to build words from. "abcdefghi" now runs around 518 loops in 5 secs and finds 351 words from "Steve McNeil" input. Steve's code crashes if "Steve McNeil" is input. ;-))
Code: QB64: [Select]
  1. _TITLE "WordCrack mod 2 with preprocessing.bas B+ 2018-10-08"
  2.  
  3. ' now with timer mod and preprocessing of word file to length of in$
  4.  
  5. 'INPUT "Enter a string to build words from "; inp$
  6. inp$ = "Steve McNeil"
  7. in$ = LCASE$(inp$)
  8. lenin = LEN(in$)
  9.  
  10. 'load words
  11. OPEN "WordList.txt" FOR BINARY AS #1
  12. gulp& = LOF(1)
  13. buff$ = STRING$(gulp&, " ")
  14. GET #1, , buff$
  15. REDIM longWords$(0)
  16. Split buff$, CHR$(10), longWords$()
  17.  
  18. 'preprocess word list, use only words <= the build words string
  19. DIM word$(84100)
  20. FOR i& = 0 TO UBOUND(longWords$)
  21.     IF LEN(longWords$(i&)) <= lenin THEN
  22.         IF LTRIM$(longWords$(i&)) <> "" THEN
  23.             IF ASC(longWords$(i&)) > 96 AND ASC(longWords$(i&)) < 123 THEN
  24.                 wi& = wi& + 1
  25.                 word$(wi&) = longWords$(i&)
  26.             END IF
  27.         END IF
  28.     END IF
  29.  
  30. 'NOW do the loop count, should add lots of loops checking one less thing per loop and looping less times
  31. start! = TIMER
  32. WHILE TIMER - start! < 5
  33.     'RUN THROUGH WORDS
  34.     foundWords% = 0
  35.     FOR n& = 0 TO wi&
  36.         c$ = in$
  37.         OK% = -1
  38.         FOR i% = 1 TO LEN(word$(n&))
  39.             p% = INSTR(c$, MID$(word$(n&), i%, 1))
  40.             IF p% = 0 THEN
  41.                 OK% = 0: EXIT FOR
  42.             ELSE
  43.                 MID$(c$, p%, 1) = "+"
  44.             END IF
  45.         NEXT
  46.         IF OK% THEN foundWords% = foundWords% + 1: PRINT word$(n&); ", ";
  47.     NEXT
  48.     counter% = counter% + 1
  49. PRINT: PRINT: PRINT "Found:"; foundWords%; "words in "; CHR$(34); inp$; CHR$(34); ","; counter%; "times in 5 secs."
  50.  
  51. SUB Split (mystr AS STRING, delim AS STRING, arr() AS STRING)
  52.     ' bplus modifications of Galleon fix of Bulrush Split reply #13
  53.     ' http://www.[abandoned, outdated and now likely malicious qb64 dot net website - don’t go there]/forum/index.php?topic=1612.0
  54.     ' this sub further developed and tested here: \test\Strings\Split test.bas
  55.     DIM copy AS STRING, p AS LONG, curpos AS LONG, arrpos AS LONG, lc AS LONG, dpos AS LONG
  56.     copy = mystr 'make copy since we are messing with mystr
  57.     'special case if delim is space, probably want to remove all excess space
  58.     IF delim = " " THEN
  59.         copy = RTRIM$(LTRIM$(copy))
  60.         p = INSTR(copy, "  ")
  61.         WHILE p > 0
  62.             copy = MID$(copy, 1, p - 1) + MID$(copy, p + 1)
  63.             p = INSTR(copy, "  ")
  64.         WEND
  65.     END IF
  66.     curpos = 1
  67.     arrpos = 0
  68.     lc = LEN(copy)
  69.     dpos = INSTR(curpos, copy, delim)
  70.     DO UNTIL dpos = 0
  71.         arr(arrpos) = MID$(copy, curpos, dpos - curpos)
  72.         arrpos = arrpos + 1
  73.         REDIM _PRESERVE arr(arrpos + 1) AS STRING
  74.         curpos = dpos + LEN(delim)
  75.         dpos = INSTR(curpos, copy, delim)
  76.     LOOP
  77.     arr(arrpos) = MID$(copy, curpos)
  78.     REDIM _PRESERVE arr(arrpos) AS STRING
  79.  

PS codeguy's code is going to crash also if only permutations up to 8 letters is allowed, without even checking I can see that.

Less than 4 hours to go. Will Steve fix his code in time? Stay tuned...  ;)
Title: Re: WordCracker
Post by: bplus on October 08, 2018, 10:23:45 am
Steve wins again! My name only has 271 words in it. :) (but none are vile)
Title: Re: WordCracker
Post by: SMcNeill on October 08, 2018, 11:27:46 am
Modified to run with any size string (even "Steve McNeil"; which, btw, is named WRONG.  /cry!!):

Code: QB64: [Select]
  1. WIDTH 80, 50
  2. DEFLNG A-Z
  3. 'SET KEY AND LETTERS
  4. keys = 0 ' ASC("S") OR &B00100000 '0 for no key limiter, otherwise use the ASC("letter") OR &B00100000 to force lowercase value
  5. in$ = LCASE$("SteveMcNeil")
  6. lenin = LEN(in$) '<<<<<<<<<<< for faster processing
  7.  
  8.  
  9. DIM word$(84100)
  10. DIM SHARED ltr(1 TO lenin) '<<<<<<<<<<< shared to debug
  11. DIM backup(1 TO lenin)
  12. DIM keycount(97 TO 122, 60000) 'letter/index
  13. DIM m1 AS _MEM, m2 AS _MEM 'just for quick array restoration
  14. m1 = _MEM(ltr()): m2 = _MEM(backup())
  15.  
  16.  
  17. 'SPLIT UP LETTERS
  18. FOR n = 1 TO lenin
  19.     ltr(n) = ASC(in$, n) OR &B00100000
  20. _MEMPUT m2, m2.OFFSET, ltr()
  21.  
  22.  
  23. 'LOAD ALL WORDS FROM .TXT FILE
  24. PRINT "LOADING...."
  25. OPEN "WordList.txt" FOR BINARY AS #1 '<<<<<<<<< moved to more appropriate place
  26. filecount = 0
  27.     LINE INPUT #1, line$
  28.     IF LEN(line$) <= lenin THEN
  29.         filecount = filecount + 1
  30.         word$(filecount) = line$
  31.         FOR i = 97 TO 122
  32.             IF INSTR(line$, CHR$(i)) THEN
  33.                 keycount(i, 0) = keycount(i, 0) + 1 'record 0 tells us how many there are for that letter
  34.                 keycount(i, keycount(i, 0)) = filecount 'add the word number to the proper index
  35.             END IF
  36.         NEXT
  37.     END IF
  38.  
  39.  
  40. t# = TIMER
  41. timelimit = 5
  42.  
  43.  
  44. DO UNTIL TIMER > t# + timelimit
  45.     CLS
  46.     loopsran = loopsran + 1
  47.     totalcount = 0
  48.  
  49.  
  50.     'RUN THROUGH WORDS
  51.     IF keys = 0 THEN k = filecount ELSE k = keycount(keys, 0)
  52.     FOR n = 1 TO k
  53.         w$ = word$(n)
  54.         FOR i = 1 TO LEN(w$)
  55.             templtr = ASC(w$, i)
  56.             FOR x = 1 TO lenin
  57.                 IF ltr(x) = templtr THEN ltr(x) = 0: GOTO stillvalid
  58.             NEXT x
  59.             GOTO skipmore
  60.             stillvalid:
  61.         NEXT i
  62.  
  63.         totalcount = totalcount + 1
  64.         PRINT w$,
  65.  
  66.         skipmore:
  67.         $CHECKING:OFF
  68.         _MEMPUT m1, m1.OFFSET, backup()
  69.         $CHECKING:ON
  70.         count = 0
  71.     NEXT n
  72.  
  73.     PRINT "DONE..."
  74. PRINT USING "###,###,###,###,### loops ran in ##.# seconds"; loopsran, timelimit
  75. PRINT totalcount; "matches found for "; in$

About 300 loops in 5 seconds with the longer name giving us more letters to check against.  Oddly enough, my list is only counting 350 words for us, and not 351 as Bplus says his is generating.  Is this a glitch in my counting method?  His?  Or is somebody leaving a word out, or including a false positive?   

I dunno!

Some digging will be required to see where that extra word comes from, and what the heck it is!  :P
Title: Re: WordCracker
Post by: SMcNeill on October 08, 2018, 11:41:33 am
Testing shows that Bplus is wrong.   :D

Compare the 2 file dumps (the first is mine, the second is Bplus's), and it's immediately obvious what the issue is, right at the very top of the file -- a BLANK word.  There's 350 words, with an extra "" at the top of his list.   

Steve wins again!  ;D
Title: Re: WordCracker
Post by: bplus on October 08, 2018, 12:09:55 pm
Cra...

Sorry about your name Steve, my eyes are going bad.

Wait,  , isn't a word? ;-))  (where the heck did that come from?)

Thanks for heads up!

Title: Re: WordCracker
Post by: SMcNeill on October 08, 2018, 12:44:28 pm
Cra...

Sorry about your name Steve, my eyes are going bad.

No worries; it's the story of my life...  My family has lived here since forever, so when it was time to name the roads, the powers that be named the road here after my family...

And, spelt it wrong!

No kidding!!

You can find me on McNeil Hill Rd, still laughing at the irony of being immortalized incorrectly.

*****************

And funniest thing?? 

The local government says we can change the name, as long as WE are willing to pay the $$$$$ to change the road signs.

Two little signs, only needing an extra "l" in them, and yet to comply to state standards, they cost over $2600 each!

It's no damn wonder our government is broke, trying to hit such costs just to stay within regulation.  On a low traveled, rural as heck road like mine, a $2.00 slab of wood, $1 can of white paint, and $1 can of black paint for lettering, would be more than sufficient to serve the needs of the community...

And yet...  The sign has to be a certain gauge steel, painted X coats of reflective green, with Y coats of Z-size white paint, mixed with XX percent reflective beads....

(And, just in case you want to see how nutty regulations are for government stuff, here's the regs on signs:  http://www.vdot.virginia.gov/business/resources/TED/final_MUTCD/2013_sup/Revision_1_Part_2_Signs.pdf --- Well, it's the BASIC regulations.  There are 14 supplements, with 5 supplements to supplement the supplements also...l)
Title: Re: WordCracker
Post by: Cobalt on October 08, 2018, 01:04:32 pm
And funniest thing?? 

The local government says we can change the name, as long as WE are willing to pay the $$$$$ to change the road signs.

Two little signs, only needing an extra "l" in them, and yet to comply to state standards, they cost over $2600 each!

It's no damn wonder our government is broke, trying to hit such costs just to stay within regulation.  On a low traveled, rural as heck road like mine, a $2.00 slab of wood, $1 can of white paint, and $1 can of black paint for lettering, would be more than sufficient to serve the needs of the community...

And yet...  The sign has to be a certain gauge steel, painted X coats of reflective green, with Y coats of Z-size white paint, mixed with XX percent reflective beads....

(And, just in case you want to see how nutty regulations are for government stuff, here's the regs on signs:  http://www.vdot.virginia.gov/business/resources/TED/final_MUTCD/2013_sup/Revision_1_Part_2_Signs.pdf --- Well, it's the BASIC regulations.  There are 14 supplements, with 5 supplements to supplement the supplements also...l)

just do what the kids do, take some paint and add your own extra 'l', find some spray paint thats lying around it costs you nothing! unless you get caught and fined. or go the extra mile with some masking tape so it looks better!
175 pages, does that meet novel requirements too?
Title: Re: WordCracker
Post by: bplus on October 08, 2018, 03:29:34 pm
Yep, for shorter strings Steve's loop count is a winner!

But for longer strings this current version is a winner.
Code: QB64: [Select]
  1. WIDTH 80, 50
  2. _TITLE "WordCrack mod 2 with preprocessing.bas B+ 2018-10-08"
  3.  
  4. ' now with timer mod and preprocessing of word file to length of in$
  5.  
  6. ' fix the extra "" word being printed   12:28 PM
  7. ' use same print method as Steve's code, change type for wordsFound
  8.  
  9. 'INPUT "Enter a string to build words from "; inp$
  10. inp$ = "thequickbrownfoxjumpedoverthelazydog"
  11. 'inp$ = "thequickbrownfoxjumpedoverthelazydogdreamingofcountingsheep"
  12. 'inp$ = "thequickbrownfoxjumpedoverthelazydogdreamingofcountingsheepthinkingofbrownfoxjumpingoverdogs"
  13. in$ = LCASE$(inp$)
  14. lenin = LEN(in$)
  15.  
  16. 'load words
  17. OPEN "WordList.txt" FOR BINARY AS #1
  18. gulp& = LOF(1)
  19. buff$ = STRING$(gulp&, " ")
  20. GET #1, , buff$
  21. REDIM longWords$(0)
  22. Split buff$, CHR$(10), longWords$()
  23.  
  24. 'preprocess word list, use only words <= the build words string
  25. DIM word$(84100)
  26. FOR i& = 0 TO UBOUND(longWords$)
  27.     IF LEN(longWords$(i&)) <= lenin THEN
  28.         IF LTRIM$(longWords$(i&)) <> "" THEN
  29.             IF ASC(longWords$(i&)) > 96 AND ASC(longWords$(i&)) < 123 THEN
  30.                 wi& = wi& + 1
  31.                 word$(wi&) = longWords$(i&)
  32.             END IF
  33.         END IF
  34.     END IF
  35.  
  36. 'NOW do the loop count, should add lots of loops checking one less thing per loop and looping less times
  37. start! = TIMER
  38. WHILE TIMER - start! < 5
  39.     'RUN THROUGH WORDS
  40.     foundWords& = 0
  41.     FOR n& = 1 TO wi& '<<<<<<<<<<<<<<<<<<<  fixed from 0 to 1
  42.         c$ = in$
  43.         OK% = -1
  44.         FOR i% = 1 TO LEN(word$(n&))
  45.             p% = INSTR(c$, MID$(word$(n&), i%, 1))
  46.             IF p% = 0 THEN
  47.                 OK% = 0: EXIT FOR
  48.             ELSE
  49.                 MID$(c$, p%, 1) = "+"
  50.             END IF
  51.         NEXT
  52.         IF OK% THEN foundWords& = foundWords& + 1: PRINT word$(n&),
  53.     NEXT
  54.     counter& = counter& + 1
  55. PRINT: PRINT: PRINT "Found:"; foundWords&; "words in "; CHR$(34); inp$; CHR$(34); ","; counter&; "times in 5 secs."
  56.  
  57. SUB Split (mystr AS STRING, delim AS STRING, arr() AS STRING)
  58.     ' bplus modifications of Galleon fix of Bulrush Split reply #13
  59.     ' http://www.[abandoned, outdated and now likely malicious qb64 dot net website - don’t go there]/forum/index.php?topic=1612.0
  60.     ' this sub further developed and tested here: \test\Strings\Split test.bas
  61.     DIM copy AS STRING, p AS LONG, curpos AS LONG, arrpos AS LONG, lc AS LONG, dpos AS LONG
  62.     copy = mystr 'make copy since we are messing with mystr
  63.     'special case if delim is space, probably want to remove all excess space
  64.     IF delim = " " THEN
  65.         copy = RTRIM$(LTRIM$(copy))
  66.         p = INSTR(copy, "  ")
  67.         WHILE p > 0
  68.             copy = MID$(copy, 1, p - 1) + MID$(copy, p + 1)
  69.             p = INSTR(copy, "  ")
  70.         WEND
  71.     END IF
  72.     curpos = 1
  73.     arrpos = 0
  74.     lc = LEN(copy)
  75.     dpos = INSTR(curpos, copy, delim)
  76.     DO UNTIL dpos = 0
  77.         arr(arrpos) = MID$(copy, curpos, dpos - curpos)
  78.         arrpos = arrpos + 1
  79.         REDIM _PRESERVE arr(arrpos + 1) AS STRING
  80.         curpos = dpos + LEN(delim)
  81.         dpos = INSTR(curpos, copy, delim)
  82.     LOOP
  83.     arr(arrpos) = MID$(copy, curpos)
  84.     REDIM _PRESERVE arr(arrpos) AS STRING
  85.  
  86.  

inp$ = "thequickbrownfoxjumpedoverthelazydog" > 74 loops versus 65 Steve's
'inp$ = "thequickbrownfoxjumpedoverthelazydogdreamingofcountingsheep" > 41 versus 36-37 Steve's
'inp$ = "thequickbrownfoxjumpedoverthelazydogdreamingofcountingsheepthinkingofbrownfoxjumpingoverdogs" > 37 versus 32-33 Steve's
Title: Re: WordCracker
Post by: SMcNeill on October 08, 2018, 03:40:47 pm
Quote
Yep, for shorter strings Steve's loop count is a winner!

But for longer strings this current version is a winner.

When dealing with such long strings, I'd definitely go a different route.  Read letters.  Preset an array.   Just compare letter counts.  I imagine it'd be a ton faster than how we're currently doing it, but that's because the program needs would be much different than originally stated.

I'll play around with it some and see just how quick a routine I can whip up for those longer words/phrases.  :)
Title: Re: WordCracker
Post by: codeguy on October 08, 2018, 03:58:27 pm
I got 361 words matching stevemcneill using the wordlist.txt as included in an attachment for this post. Somewhere, someone is wrong. Mine was modified to accommodate the added characters. Yes, it took a while as I did no optimizations for eliminating impossible prefixes and suffixes.
Title: Re: WordCracker
Post by: bplus on October 08, 2018, 04:22:34 pm
Hi codeguy,

For stevemcneill, 2 l's on end, I am getting 392 words with Steve's code and my version.

I don't know about just a letter? is t a word? I know I is. ;)
Title: Re: WordCracker
Post by: SMcNeill on October 08, 2018, 04:27:12 pm
Hi codeguy,

For stevemcneill, 2 l's on end, I am getting 392 words with Steve's code and my version.

I don't know about just a letter? is t a word? I know I is. ;)

Large dictionaries list each letter as a single-letter word. Each such word is defined as a noun, denoting the letter with which it is spelled.

"Psychology" starts with a p.

*********************

What if Q from Star Trek took the L train to the T intersection of C and D streets because he had a map where an X marked that spot? Would he get an A for following directions?
Title: Re: WordCracker
Post by: codeguy on October 08, 2018, 04:41:19 pm
You ARE using the original wordlist.txt from the original thread post https://www.qb64.org/forum/index.php?action=dlattach;topic=679.0;attach=1619 (https://www.qb64.org/forum/index.php?action=dlattach;topic=679.0;attach=1619), right? I got 361 unique matches using "stevemcneill" as the original string on my modified code, which was considerably slower BUT it does EXHAUSTIVE permutations of every letter, leading to quite large numbers, in fact the size of the job increases as a factorial.
Title: Re: WordCracker
Post by: bplus on October 08, 2018, 04:52:37 pm
Hi codeguy,

Is your code geared to handle the triple e and double l in stevemcneill? This is why I thought "nah!" for permutations.

Hi Steve,

You sure know how to drive home a point, N, S, E and W! ;)
Title: Re: WordCracker
Post by: codeguy on October 08, 2018, 05:23:50 pm
No, mine does unconditional permutations. Which is why it may have detected 3 more words than other submissions. But it does generate matches in lexical (alphabetical) order, saving sorting. For 8 characters or less, the speed is VERY similar to Steve's algorithm, running 8+ times to completion in the 5 second limit. Mine in 8 character mode, runs about .60s or less using the complete wordlist.txt document.
Title: Re: WordCracker
Post by: SMcNeill on October 08, 2018, 06:50:09 pm
My entry into the field of long-arse string searches:

Code: QB64: [Select]
  1. SCREEN _NEWIMAGE(800, 600, 32)
  2. CONST In$ = "thequickbrownfoxjumpedoverthelazydogdreamingofcountingsheepthinkingofbrownfoxjumpingoverdogs"
  3.  
  4. InLen = LEN(in$)
  5.  
  6.  
  7. OPEN "WordList.txt" FOR BINARY AS #1
  8.     LINE INPUT #1, junk$
  9.     WordCount = WordCount + 1
  10.     IF LEN(junk$) > maxlength THEN maxlength = LEN(junk$)
  11. SEEK 1, 1
  12.  
  13. TYPE WordLetters
  14.     letter AS _UNSIGNED _BYTE
  15.     count AS _UNSIGNED _BYTE
  16. DIM Words(WordCount) AS STRING, Match(WordCount) AS LONG
  17. DIM WordLetters(WordCount, 26) AS WordLetters
  18. DIM Letters(97 TO 122) AS _UNSIGNED _BYTE
  19.  
  20. FOR i = 1 TO WordCount 'Load and prepare our word list and wordsearch array
  21.     LINE INPUT #1, Words(i)
  22.     FOR j = 1 TO LEN(Words(i))
  23.         a = ASC(Words(i), j)
  24.         IF a > 96 AND a < 123 THEN
  25.             afound = 0
  26.             FOR k = 1 TO WordLetters(i, 0).count
  27.                 IF a = WordLetters(i, k).letter THEN afound = k: EXIT FOR
  28.             NEXT
  29.             IF afound THEN
  30.                 WordLetters(i, afound).count = WordLetters(i, afound).count + 1 'add to the count of an existing letter
  31.             ELSE
  32.                 WordLetters(i, 0).count = WordLetters(i, 0).count + 1 'increase the total count of letters
  33.                 w = WordLetters(i, 0).count
  34.                 WordLetters(i, w).letter = a 'save the letter
  35.                 WordLetters(i, w).count = 1 'and count it 1 for the first time it appeared
  36.             END IF
  37.         ELSE
  38.             WordLetters(i, 0).count = 0 'invalidate words with non A-Z letters
  39.             EXIT FOR
  40.         END IF
  41.     NEXT
  42.  
  43. FOR i = 1 TO InLen 'get the letters for the search string
  44.     a = (ASC(in$, i) OR 32)
  45.     Letters(a) = Letters(a) + 1
  46.  
  47.  
  48. t# = TIMER
  49. DO UNTIL TIMER > t# + 5
  50.     loopcount = loopcount + 1
  51.     wordfound = 0 'reset our counter
  52.  
  53.     'THE MAIN SEARCH ROUTINE HERE
  54.  
  55.     FOR i = 1 TO WordCount 'now check every word for a match
  56.         sl = WordLetters(i, 0).count 'search limit
  57.         IF sl = 0 THEN GOTO invalid
  58.         FOR j = 1 TO sl
  59.             l = WordLetters(i, j).letter 'the letter in the word
  60.             IF Letters(l) < WordLetters(i, j).count THEN GOTO invalid 'it's impossible
  61.         NEXT
  62.         wordfound = wordfound + 1
  63.         Match(wordfound) = i
  64.         invalid:
  65.     NEXT
  66.  
  67.     'END OF THE MAIN SEARCH ROUTINE
  68.  
  69.  
  70. 'Print Results
  71. FOR i = 1 TO wordfound
  72.     PRINT Words(Match(i));
  73.     IF i < wordfound THEN PRINT ","; ELSE PRINT
  74. PRINT "DONE, with"; wordfound; "matches, with a speed of"; loopcount; "runs in 5 seconds."
  75.  
  76.  
  77. 'And, if you want to see the secret behind this method, here's how we store and retrieve our data:
  78.  
  79. FOR i = 1 TO 10 'I think just showing how we track 10 words should be fine enough
  80.     PRINT Words(i),
  81.     FOR j = 1 TO WordLetters(i, 0).count
  82.         PRINT CHR$(WordLetters(i, j).letter);
  83.         PRINT WordLetters(i, j).count;
  84.         PRINT ",";
  85.     NEXT
  86.     PRINT
  87.  
  88.  
  89. 'and here's the in$ and how its letter count holds up
  90.  
  91.  
  92. PRINT in$
  93. FOR i = 97 TO 122
  94.     PRINT Letters(i);
  95.  

Instead of 32-37 runs in 5 seconds, this does about 150 on my machine.   

If you're curious how it does it, remove the END statement and then let it run.  It'll show you how we basically break down our words so we never have to check the letters in them any more times than is absolutely necessary to make a match.  ;)
Title: Re: WordCracker
Post by: SMcNeill on October 08, 2018, 07:30:11 pm
And, a slight tweak to do some pre-sorting based on In$ size and non-A-Z characters, and to add a key$ to make certain that the words all contain a valid letter we designate, as per the original post's requirements:

Code: QB64: [Select]
  1. SCREEN _NEWIMAGE(800, 600, 32)
  2. CONST In$ = "thequickbrownfoxjumpedoverthelazydogdreamingofcountingsheepthinkingofbrownfoxjumpingoverdogs"
  3. 'CONST In$ = "abcdefghi"
  4. 'key$ = "a" 'make "" or remark this line out completely to do a search without a required key, as per the original post
  5. InLen = LEN(In$)
  6.  
  7.  
  8. OPEN "WordList.txt" FOR BINARY AS #1
  9.     LINE INPUT #1, junk$
  10.     WordCount = WordCount + 1
  11.     IF LEN(junk$) > maxlength THEN maxlength = LEN(junk$)
  12. SEEK 1, 1
  13.  
  14. TYPE WordLetters
  15.     letter AS _UNSIGNED _BYTE
  16.     count AS _UNSIGNED _BYTE
  17. DIM Words(WordCount) AS STRING, Match(WordCount) AS LONG
  18. DIM WordLetters(WordCount, 17) AS WordLetters
  19. DIM Letters(97 TO 122) AS _UNSIGNED _BYTE
  20.  
  21.  
  22. FOR i = 1 TO WordCount
  23.     wc = wc + 1
  24.     LINE INPUT #1, Words(wc)
  25.     IF LEN(Words(wc)) > InLen THEN 'remove words too long to fit
  26.         wc = wc - 1
  27.     ELSE
  28.         FOR j = 1 TO LEN(Words(wc)) 'remove non A-Z words as we're only searching for words with those letters
  29.             IF ASC(Words(wc), j) < 97 OR ASC(Words(wc), j) > 122 THEN wc = wc - 1: EXIT FOR
  30.         NEXT
  31.     END IF
  32. WordCount = wc
  33.  
  34. FOR i = 1 TO WordCount 'Load and prepare our word list and wordsearch array
  35.     FOR j = 1 TO LEN(Words(i))
  36.         a = ASC(Words(i), j)
  37.         afound = 0
  38.         FOR k = 1 TO WordLetters(i, 0).count
  39.             IF a = WordLetters(i, k).letter THEN afound = k: EXIT FOR
  40.         NEXT
  41.         IF afound THEN
  42.             WordLetters(i, afound).count = WordLetters(i, afound).count + 1 'add to the count of an existing letter
  43.         ELSE
  44.             WordLetters(i, 0).count = WordLetters(i, 0).count + 1 'increase the total count of letters
  45.             w = WordLetters(i, 0).count
  46.             WordLetters(i, w).letter = a 'save the letter
  47.             WordLetters(i, w).count = 1 'and count it 1 for the first time it appeared
  48.         END IF
  49.     NEXT
  50.  
  51. FOR i = 1 TO InLen 'get the letters for the search string
  52.     a = (ASC(In$, i) OR 32)
  53.     Letters(a) = Letters(a) + 1
  54.  
  55.  
  56. t# = TIMER
  57. DO UNTIL TIMER > t# + 5
  58.     loopcount = loopcount + 1
  59.     wordfound = 0 'reset our counter
  60.  
  61.     'THE MAIN SEARCH ROUTINE HERE
  62.  
  63.     FOR i = 1 TO WordCount 'now check every word for a match
  64.         sl = WordLetters(i, 0).count 'search limit
  65.         FOR j = 1 TO sl
  66.             l = WordLetters(i, j).letter 'the letter in the word
  67.             IF Letters(l) < WordLetters(i, j).count THEN GOTO invalid 'it's impossible
  68.         NEXT
  69.         IF key$ = "" THEN
  70.             wordfound = wordfound + 1
  71.             Match(wordfound) = i
  72.         ELSEIF INSTR(Words(i), key$) <> 0 THEN
  73.             wordfound = wordfound + 1
  74.             Match(wordfound) = i
  75.         END IF
  76.         invalid:
  77.     NEXT
  78.  
  79.     'END OF THE MAIN SEARCH ROUTINE
  80.  
  81.  
  82.  
  83. 'Print Results
  84. FOR i = 1 TO wordfound
  85.     PRINT Words(Match(i));
  86.     IF i < wordfound THEN PRINT ","; ELSE PRINT
  87. PRINT "DONE, with"; wordfound; "matches, with a speed of"; loopcount; "runs in 5 seconds."
  88.  
  89.  
  90.  
  91.  
  92.  
  93. 'And, if you want to see the secret behind this method, here's how we store and retrieve our data:
  94.  
  95. FOR i = 1 TO 10 'I think just showing how we track 10 words should be fine enough
  96.     PRINT Words(i),
  97.     FOR j = 1 TO WordLetters(i, 0).count
  98.         PRINT CHR$(WordLetters(i, j).letter);
  99.         PRINT WordLetters(i, j).count;
  100.         PRINT ",",
  101.     NEXT
  102.     PRINT
  103.  
  104.  
  105. 'and here's the in$ and how its letter count holds up
  106.  
  107.  
  108. PRINT In$
  109. FOR i = 97 TO 122
  110.     PRINT CHR$(i); LTRIM$(RTRIM$(STR$(Letters(i))));
  111.     IF i < 122 THEN PRINT ", "; ELSE PRINT

We now run nice and fast for both short and long strings.  We can use a preset qualifier key if we want to, but we don't have to.  Run times are ~900 loops for "abcdefghi" with a key of "a", and ~200 times with the long entry Bplus used in his test routine earlier.
Title: Re: WordCracker
Post by: bplus on October 08, 2018, 07:34:55 pm
Yah Steve, 140 loops on my machine fabulous. Poor Peter, you robbed him to pay Paul. ;-))
(Steve, this comment based on your previous code post, you posted another while I was getting list ready for codeguy.)

Hi code guy,

Here is my list of 392 words from stevemcneill from the given WordList.txt file posted earlier in thread.
Hope this helps you find differences in lists.

PS what is unconditional permutation?

Title: Re: WordCracker
Post by: bplus on October 08, 2018, 07:59:49 pm
Some time ago, I built a Word Search Solver and had thought about making a Word Search Puzzle Builder.

With this Word Crack code review, I am rethinking building a Word Search Puzzle Builder based on a longish quote. Hmm... a few more details to work out... but have good word list generator now and if you throw in allot of extra of the same letters, could be a quite a challenge, like getting a puzzle where all the pieces look the same.
Title: Re: WordCracker
Post by: SMcNeill on October 08, 2018, 08:19:52 pm
And for those who are having trouble understanding how my last 2 examples work, the secret is in the letter counter.

Take APPLE as an example...  Instead of searching to see if the In$ has those 5 letters, we instead count the letters first, when we load the dictionary.  1 A, 2 P, 1 L, 1 E...    All we have to check is a maximum of 4 times, instead of 5; we just check to make certain In$ has 2 Ps in it once, rather than checking twice...

In some cases, this reduces a ton of letter checks.   MISSISSIPPI.  Instead of 11 letters, we check 4...   (1 M, 4 I, 4 S, 2 P).

Minimal conditional checking makes for maximum performance.  ;)
Title: Re: WordCracker
Post by: bplus on October 08, 2018, 08:32:47 pm
And for those who are having trouble understanding how my last 2 examples work, the secret is in the letter counter.

Take APPLE as an example...  Instead of searching to see if the In$ has those 5 letters, we instead count the letters first, when we load the dictionary.  1 A, 2 P, 1 L, 1 E...    All we have to check is a maximum of 4 times, instead of 5; we just check to make certain In$ has 2 Ps in it once, rather than checking twice...

In some cases, this reduces a ton of letter checks.   MISSISSIPPI.  Instead of 11 letters, we check 4...   (1 M, 4 I, 4 S, 2 P).

Minimal conditional checking makes for maximum performance.  ;)

PLUS, if you want to save even more time, save the processed Word List back into a new Data File, once and forever preprocessed!

Peter smiles, he ain't feel'in so broke no more.
Title: Re: WordCracker
Post by: codeguy on October 08, 2018, 08:49:01 pm
Unconditional permutation means I perform no checking as to whether the next generated permutation actually fits the pattern of the language. In essence, it will check for permutations of words beginning with rx or some other beginning consonant pair that does not appear in standard language. This is handy for finding abbreviations like RPM or GPS or even TLDR, which are not words in the standard sense, but is included in the English language as shorthand for Revolutions Per Minute Global Positioning System and Too Late Didn't READ. Even stuff like RTFM, whose translation I will leave to the reader. While my method is slow for very large strings, it is absolutely thorough and can work for languages that are not English too. Slow? Yes, it is slow, but it will not skip any possible permutations that could lead to missing words in a word list. This is why it's considered an exhaustive algorithm. Also, my method eliminates any chance for repetitions of words found. BTW, my name has 871 exactly matching words that are in wordlist.txt, among them. Sanhedrin, Idaho and ordinals. Weird, huh? But for words of 8 characters or less, using the permutation method and my searching algorithm is competitive to Steve's work. This word list does not contain axolotl, a kind of salamander (301 words in wordlist.txt), but 32 words contained in axolotl were found. With my algorithm, if it's in there, this will find it AND give results in sorted order.
Title: Re: WordCracker
Post by: bplus on October 08, 2018, 09:05:36 pm
Thanks codeguy,

As the man said, "Minimal conditional checking makes for maximum performance.  ;)  "

Oh hey, I've got try this with my middle name too, 1106! including roman, sagamen.
Title: Re: WordCracker
Post by: SMcNeill on October 08, 2018, 09:33:10 pm
And for those who are having trouble understanding how my last 2 examples work, the secret is in the letter counter.

Take APPLE as an example...  Instead of searching to see if the In$ has those 5 letters, we instead count the letters first, when we load the dictionary.  1 A, 2 P, 1 L, 1 E...    All we have to check is a maximum of 4 times, instead of 5; we just check to make certain In$ has 2 Ps in it once, rather than checking twice...

In some cases, this reduces a ton of letter checks.   MISSISSIPPI.  Instead of 11 letters, we check 4...   (1 M, 4 I, 4 S, 2 P).

Minimal conditional checking makes for maximum performance.  ;)

PLUS, if you want to save even more time, save the processed Word List back into a new Data File, once and forever preprocessed!

Peter smiles, he ain't feel'in so broke no more.

If I were to save it as a processed list, I'd sort it first to put maximum letters first.

Apple has 2Ps and only one of every other letter.  Since double letters are rarer than single letters, if we check for the PP first, we're more likely to be able to skip the rest of the checks.

So, instead of 1A2P1L1E, I'd store it as 2P1A1E1L....

Take ELEVEN for a perfect example.  It's rare to see 4 Es in a word, but not so rare to find a single L, V, or N.  Check it first and chances are you can skip all the other letters completely.

***************

Edit: Scrabble letter values would actually be a good criteria for sorting order.  Z, X, Q at the front of the search list, A, E, I, O, U, S, T, and such as the last things compared.

I imagine you could eek out a considerable performance boost with minimal effort, implementing such a method.
Title: Re: WordCracker
Post by: codeguy on October 08, 2018, 11:10:34 pm
Steve, I was really impressed with your speedy performance, so I took the liberty of modifying it for exact same results and significantly faster performance.
Code: QB64: [Select]
  1.  
  2. SCREEN _NEWIMAGE(800, 600, 32)
  3. CONST In$ = "thequickbrownfoxjumpedoverthelazydogdreamingofcountingsheepthinkingofbrownfoxjumpingoverdogs"
  4. DIM wordcount AS LONG: wordcount = 0
  5. DIM maxlength AS LONG: maxlength = 0
  6. InLen = LEN(In$)
  7.  
  8.  
  9. OPEN "WordList.txt" FOR BINARY AS #1
  10.     LINE INPUT #1, junk$
  11.     wordcount = wordcount + 1
  12.     IF LEN(junk$) > maxlength THEN maxlength = LEN(junk$)
  13. SEEK 1, 1
  14.  
  15. TYPE WordLetters
  16.     letter AS _UNSIGNED _BYTE
  17.     count AS _UNSIGNED _BYTE
  18. DIM Words(wordcount) AS STRING, Match(wordcount) AS LONG
  19. DIM WordLetters(wordcount, 26) AS WordLetters
  20. DIM Letters(97 TO 122) AS _UNSIGNED _BYTE
  21. DIM KIterWordLettersCount AS LONG
  22. DIM afound AS LONG: afound = 0
  23. FOR i = 1 TO wordcount 'Load and prepare our word list and wordsearch array
  24.     LINE INPUT #1, Words(i)
  25.     FOR j = 1 TO LEN(Words(i))
  26.         ascii = ASC(Words(i), j)
  27.         SELECT CASE ascii
  28.             CASE 97 TO 122
  29.                 FOR KIterWordLettersCount = 1 TO WordLetters(i, 0).count
  30.                     IF ascii = WordLetters(i, KIterWordLettersCount).letter THEN afound = KIterWordLettersCount: EXIT FOR
  31.                 NEXT
  32.                 IF afound THEN
  33.                     WordLetters(i, afound).count = WordLetters(i, afound).count + 1 'add to the count of an existing letter
  34.                     afound = 0
  35.                 ELSE
  36.                     WordLetters(i, 0).count = WordLetters(i, 0).count + 1 'increase the total count of letters
  37.                     w = WordLetters(i, 0).count
  38.                     WordLetters(i, w).letter = ascii 'save the letter
  39.                     WordLetters(i, w).count = 1 'and count it 1 for the first time it appeared
  40.                 END IF
  41.             CASE ELSE
  42.                 WordLetters(i, 0).count = 0 'invalidate words with non A-Z letters
  43.                 EXIT FOR
  44.         END SELECT
  45.     NEXT
  46.  
  47. FOR i = 1 TO InLen 'get the letters for the search string
  48.     ascii = (ASC(In$, i) OR 32)
  49.     Letters(ascii) = Letters(ascii) + 1
  50.  
  51. DIM wordfound AS LONG
  52. t# = TIMER(.001)
  53. DO UNTIL TIMER > t# + 5
  54.     loopcount = loopcount + 1
  55.     wordfound = 0 'reset our counter
  56.  
  57.     'THE MAIN SEARCH ROUTINE HERE
  58.  
  59.     FOR i = 1 TO wordcount 'now check every word for a match
  60.         sl = WordLetters(i, 0).count 'search limit
  61.         IF sl THEN
  62.             FOR j = 1 TO sl
  63.                 l = WordLetters(i, j).letter 'the letter in the word
  64.                 IF Letters(l) < WordLetters(i, j).count THEN GOTO invalid 'it's impossible
  65.             NEXT
  66.             wordfound = wordfound + 1
  67.             Match(wordfound) = i
  68.         END IF
  69.         invalid:
  70.     NEXT
  71.  
  72.     'END OF THE MAIN SEARCH ROUTINE
  73.  
  74.  
  75. 'Print Results
  76. FOR i = 1 TO wordfound
  77.     PRINT Words(Match(i));
  78.     IF i < wordfound THEN PRINT ","; ELSE PRINT
  79. PRINT "DONE, with"; wordfound; "matches, with a speed of"; loopcount; "runs in 5 seconds."
  80.  
  81.  
  82. 'And, if you want to see the secret behind this method, here's how we store and retrieve our data:
  83.  
  84. FOR i = 1 TO 10 'I think just showing how we track 10 words should be fine enough
  85.     PRINT Words(i),
  86.     FOR j = 1 TO WordLetters(i, 0).count
  87.         PRINT CHR$(WordLetters(i, j).letter);
  88.         PRINT WordLetters(i, j).count;
  89.         PRINT ",";
  90.     NEXT
  91.     PRINT
  92.  
  93.  
  94. 'and here's the in$ and how its letter count holds up
  95.  
  96.  
  97. PRINT In$
  98. FOR i = 97 TO 122
  99.     PRINT Letters(i);
  100.  
  101.  
On my humble machine, this represents a 45 loops/5s to 80+ loops/5s. Awesome work, Steve.
Title: Re: WordCracker
Post by: SMcNeill on October 09, 2018, 12:13:36 am
Thanks for the kind words, Codeguy.  ;)

I've worked with datasets like this thousands of times in the past, so I've learned a few tricks for making them run efficiently.  The above was "speedy enough" for most needs, but there's methods quite a bit faster we could employ -- if we wanted to put forth the effort and alter our data somewhat.

Absolute fastest method I can imagine is by dividing our data into a tree structure....

For example, let's start with this tree:

A
AA
AAA

The first 3 entries on our list are those three.  By "treeing" our data, we say, "If I don't have A in the search phrase, then I can't have anything below A"

Eliminate "A" and we eliminate EVERYTHING with an A.  Our search list just dropped 50k words.

If we have A, but not AA, we've eliminated all words with AA from our search list...

It's a "cascading elimination" scheme and it's efficient, and fast, as heck!

The main issue with it is generating the lookup table to begin with...  Your data would need to be stored in a similar manner to this:
A (the word), 52154 (number of words with this eliminator), 2,3,4,5,6.... (Word list)
AA (next word), 2154 (number of words with this eliminator), 3,44,67,87,... (Word list)

**********************

It would bloat our data file considerably, depending on how many "eliminators" we want to use (why use anything more than 2 digits?  Longer words get more unique, the longer they become.), but it'd reduce our list of possible words to check by huge chunks at a time...

Fastest method I can think of, at the moment anyway.  ;)

(And if you look at my previous code, you can see where I was already generating lists which we could use for elimination purposes for single letters back with the original code in message #18.)
Title: Re: WordCracker
Post by: bplus on October 10, 2018, 01:39:23 pm
This letter count number formula thing lends itself obviously to anagrams, so yesterday I started modifying the file with BFormulas  a 26 chr$() string of counts for a, b, c... and all day long as I proceeded through with code tests, I had strongest feeling of Deja Vu  that I had done this before, that we, Steve too, had worked through this before, probably at Walter's forum. I check through old code posts 3 times and do not find anything on Anagrams... so I keep going reluctantly because it becomes more and more clear we had done this before.

Finally late last night, I do find the old code posted under Rosetta Folder! Yes, that's right because it started from a challenge from Rosetta stuff we were doing. Dang! I failed to remember the biggest hint of all, to sort by the BFormula$ key. Man what a time saver it is doing it that way, like a blink of the eye!

So if you find a word in your name or you are word building from a string, every anagram of it is automatically included. So along with filing the WordList with the BFormula key at start for saving allot of time, some more time saving could be made by listing all the anagrams that come with such a BFormula! WordList was reduced by 6,500+ lines by listing anagrams. Can't wait to try timed tests for generating words lists.