Print Page - WordCracker

Active Forums => Programs => Topic started by: Zeppelin on October 07, 2018, 03:56:15 am

Title: WordCracker
Post by: Zeppelin on October 07, 2018, 03:56:15 am

Hey,
I run into a issue with my program.
Im trying to make a program that will, when given a string of 9 letters and 1 key letter to print out all the possible words using these letters.
For example:
9 Letters: ABCDEFGHI
Key Letter: A
PRINTS:
BAD
CAB
DEAF
etc....

I am using a .txt database full of words from the Oxford dictionary and each time I run the program nothing is output. I cant find the issue.

Thanks,
Zeppelin

Ps. The program and wordlist are attached below.

Title: Re: WordCracker
Post by: SMcNeill on October 07, 2018, 07:37:00 am

http://qb64.freeforums.net/thread/42/scrabble-word-maker -- Sounds like what you're wanting is basically the same thing I've did here. Give it a look and see if it helps.

Title: Re: WordCracker
Post by: bplus on October 07, 2018, 10:33:25 am

Code: QB64: [Select]

'SPLIT UP LETTERS
FOR n = 1 TO 9
    ltr$(n) = MID$(in$, n, 1)
NEXT n
PRINT LEN(ltr$)  '>>>>>>>>>>>> 0 !!!
END
 

Code: QB64: [Select]

            FOR x = 1 TO LEN(in$)  '<<<<<<<<<<<<<<<<<<<<<<< change to this?
                IF ltr$(x) = templtr$ THEN
                    ltr$(x) = ""
                    count = count + 1
                END IF
            NEXT x
 

OH!!! This is screwing you up too!

Code: QB64: [Select]

    FOR n = 1 TO 9
        ltr$(n) = MID$(in$, n, 1)
    NEXT n

You are using n for the word index from the file, and then changing n here when resetting ltr$() array!!!

Title: Re: WordCracker
Post by: bplus on October 07, 2018, 10:47:30 am

OK now get stuff printed in reasonable time!

Code: QB64: [Select]

 
DIM word$(84100)
DIM key$
DIM SHARED ltr$(9)  '<<<<<<<<<<< shared to debug
 
'SET KEY AND LETTERS
key$ = "a"
in$ = "abcdefghi"
lenin = LEN(in$) '<<<<<<<<<<< for faster processing
 
'SPLIT UP LETTERS
FOR n = 1 TO 9
    ltr$(n) = MID$(in$, n, 1)
NEXT n
 
'LOAD ALL WORDS FROM .TXT FILE
PRINT "LOADING...."
OPEN "WordList.txt" FOR INPUT AS #1  '<<<<<<<<< moved to more appropriate place
WHILE NOT EOF(1)
    filecount = filecount + 1
    INPUT #1, line$
    word$(filecount) = line$
WEND
CLOSE #1
CLS
 
 
'RUN THROUGH WORDS
FOR n = 1 TO filecount
    findkey = INSTR(word$(n), key$)
 
    IF findkey THEN
        FOR i = 1 TO LEN(word$(n))
            templtr$ = MID$(word$(n), i, 1)
            FOR x = 1 TO lenin '<<<<<<<<<<<<<<<<< main fix #1
                IF ltr$(x) = templtr$ THEN
                    ltr$(x) = ""
                    count = count + 1
                END IF
                'debugging
                'PRINT "Update: file word = "; word$(n); " and here is current letters crossed off: "; letters$
                'INPUT "OK press enter... "; wate$
            NEXT x
        NEXT i
 
        IF count = LEN(word$(n)) THEN
            PRINT word$(n)
        END IF
    END IF
 
    FOR m = 1 TO 9  'main fix #2 n's  to m's
        ltr$(m) = MID$(in$, m, 1)
    NEXT m
    count = 0
NEXT n
 
PRINT "DONE..."
 
'for debugging
FUNCTION letters$ ()
    FOR i = 1 TO 9
        b$ = b$ + ltr$(i)
    NEXT
    letters$ = "*" + b$ + "*"
END FUNCTION
 

DONE? Looks like the logic was OK, just the variable assignments needed fixing.

Title: Re: WordCracker
Post by: codeguy on October 07, 2018, 02:09:15 pm

Gives you every permutation of letters, numbers, etc. If parray() is ordered, so is the output.

Code: QB64: [Select]

WIDTH 80, 43
DIM parray(0 TO 3) AS LONG
FOR i = 0 TO UBOUND(parray)
    parray(i) = i
NEXT
PRINT
PRINT
PRINT "permutations"
Permute parray(), 0, UBOUND(parray), np
DO
    x$ = INKEY$
LOOP UNTIL x$ > ""
SYSTEM
 
SUB DisplayResults (PArray() AS LONG, start, finish, np AS DOUBLE)
DIM i AS LONG
PRINT USING "#,###,###,###,###"; np;
FOR i = LBOUND(parray) TO UBOUND(parray)
    PRINT PArray(i);
NEXT
PRINT
END SUB
 
SUB Rotate (parray() AS LONG, Start AS LONG, finish AS LONG)
DIM ts AS LONG
ts = parray(Start)
FOR i = Start TO finish - 1
    SWAP parray(i), parray(i + 1)
NEXT
parray(finish) = ts
END SUB
 
SUB Permute (parray() AS LONG, start AS LONG, finish AS LONG, np AS DOUBLE)
np = np + 1
DisplayResults parray(), LBOUND(parray), UBOUND(parray), np
IF start < finish THEN
    DIM i AS LONG
    DIM j AS LONG
    FOR i = finish - 1 TO start STEP -1
        FOR j = i + 1 TO finish
            SWAP parray(i), parray(j)
            Permute parray(), i + 1, finish, np
        NEXT
        Rotate parray(), i, finish
    NEXT
END IF
END SUB
 

Title: Re: WordCracker
Post by: bplus on October 07, 2018, 02:52:59 pm

Hi codeguy,

I thought about permutations and then thought nah! won't work with real words at varying lengths.

But I could be wrong, wouldn't be first time.

Wanna race? You modify your code for checking for real words and I will see if I can optimize Zeppellin's code some more.... post in 24 hours?

Title: Re: WordCracker
Post by: SMcNeill on October 07, 2018, 03:32:28 pm

Quote from: bplus on October 07, 2018, 02:52:59 pm

Hi codeguy,

I thought about permutations and then thought nah! won't work with real words at varying lengths.

But I could be wrong, wouldn't be first time.

Wanna race? You modify your code for checking for real words and I will see if I can optimize Zeppellin's code some more.... post in 24 hours?

Here's how I'd go, I think:

First, reduce the word list to exclude any words > 9 digits. No need to search for impossible combinations.
Then count letters. Save these in an array WordLetters(1 TO WordCount,1 TO 26).

Then count letters in the target word.
Compare. If target count > word count then it's a match!

************************
The word list is going to be rather limited (less than 50k words I'd imagine), and most searches will terminate rather quickly. (need an "A", don't have any? Quit the search at this point.)

I really don't think you'd need to worry about optimizing for speed any more than that, to be honest.

Title: Re: WordCracker
Post by: bplus on October 07, 2018, 03:44:47 pm

Quote from: SMcNeill on October 07, 2018, 03:32:28 pm

Quote from: bplus on October 07, 2018, 02:52:59 pm
Hi codeguy,

I thought about permutations and then thought nah! won't work with real words at varying lengths.

But I could be wrong, wouldn't be first time.

Wanna race? You modify your code for checking for real words and I will see if I can optimize Zeppellin's code some more.... post in 24 hours?

Here's how I'd go, I think:

First, reduce the word list to exclude any words > 9 digits. No need to search for impossible combinations.
Then count letters. Save these in an array WordLetters(1 TO WordCount,1 TO 26).

Then count letters in the target word.
Compare. If target count > word count then it's a match!

************************
The word list is going to be rather limited (less than 50k words I'd imagine), and most searches will terminate rather quickly. (need an "A", don't have any? Quit the search at this point.)

I really don't think you'd need to worry about optimizing for speed any more than that, to be honest.

Hmm... I think you are saying permutations is slower too?

I am thinking this part:

Code: QB64: [Select]

'SET KEY AND LETTERS
key$ = "a"
in$ = "abcdefghi"
lenin = LEN(in$) '<<<<<<<<<<< for faster processing
 

key$ and in$ might be input into the program at run time, so the letters might not be nine and in$ might be real words too not segments of the alphabet. That's how I would use this code for word games or word game solving. And the real words might have 2 or 3 of the same letters. I was planning on optimizing for such events.

Title: Re: WordCracker
Post by: SMcNeill on October 07, 2018, 04:28:46 pm

Here's an easy fix for you to optimize your code and run it faster. (An increase of about 15x faster on my machine.)

I'll post 2 routines here so you can run them and compare for speed differences:

YOURS (modified to loop and run for a set amount of time -- 5 seconds in this case):

Code: QB64: [Select]

DIM word$(84100)
DIM key$
DIM SHARED ltr$(9) '<<<<<<<<<<< shared to debug
 
'SET KEY AND LETTERS
key$ = "a"
in$ = "abcdefghi"
lenin = LEN(in$) '<<<<<<<<<<< for faster processing
 
'SPLIT UP LETTERS
FOR n = 1 TO 9
    ltr$(n) = MID$(in$, n, 1)
NEXT n
 
 
 
'LOAD ALL WORDS FROM .TXT FILE
PRINT "LOADING...."
OPEN "WordList.txt" FOR INPUT AS #1 '<<<<<<<<< moved to more appropriate place
filecount = 0
WHILE NOT EOF(1)
    filecount = filecount + 1
    INPUT #1, line$
    word$(filecount) = line$
WEND
CLOSE #1
 
 
t# = TIMER
 
timelimit = 5
DO UNTIL TIMER > t# + timelimit
    CLS
    loopsran = loopsran + 1
 
 
 
    'RUN THROUGH WORDS
    FOR n = 1 TO filecount
        findkey = INSTR(word$(n), key$)
 
        IF findkey THEN
            FOR i = 1 TO LEN(word$(n))
                templtr$ = MID$(word$(n), i, 1)
                FOR x = 1 TO lenin '<<<<<<<<<<<<<<<<< main fix #1
                    IF ltr$(x) = templtr$ THEN
                        ltr$(x) = ""
                        count = count + 1
                    END IF
                    'debugging
                    'PRINT "Update: file word = "; word$(n); " and here is current letters crossed off: "; letters$
                    'INPUT "OK press enter... "; wate$
                NEXT x
            NEXT i
 
            IF count = LEN(word$(n)) THEN
                PRINT word$(n)
            END IF
        END IF
 
        FOR m = 1 TO 9 'main fix #2 n's  to m's
            ltr$(m) = MID$(in$, m, 1)
        NEXT m
        count = 0
    NEXT n
 
    PRINT "DONE..."
LOOP
PRINT USING "###,###,###,###,### loops ran in ##.# seconds"; loopsran, timelimit
 
 
'for debugging
FUNCTION letters$ ()
    FOR i = 1 TO 9
        b$ = b$ + ltr$(i)
    NEXT
    letters$ = "*" + b$ + "*"
END FUNCTION

Modified:

Code: QB64: [Select]

DEFLNG A-Z
DIM word$(84100)
DIM key$
DIM SHARED ltr(9) '<<<<<<<<<<< shared to debug
 
'SET KEY AND LETTERS
key$ = "a"
in$ = "abcdefghi"
lenin = LEN(in$) '<<<<<<<<<<< for faster processing
 
'SPLIT UP LETTERS
FOR n = 1 TO 9
    ltr(n) = ASC(in$, n)
NEXT n
 
 
 
 
 
'LOAD ALL WORDS FROM .TXT FILE
PRINT "LOADING...."
OPEN "WordList.txt" FOR BINARY AS #1 '<<<<<<<<< moved to more appropriate place
filecount = 0
WHILE NOT EOF(1)
    LINE INPUT #1, line$
    IF LEN(line$) < 10 THEN
        filecount = filecount + 1
        word$(filecount) = line$
    END IF
WEND
CLOSE #1
 
 
t# = TIMER
timelimit = 5
DO UNTIL TIMER > t# + timelimit
    CLS
    loopsran = loopsran + 1
 
 
    'RUN THROUGH WORDS
    FOR n = 1 TO filecount
        findkey = INSTR(word$(n), key$)
 
        IF findkey THEN
            FOR i = 1 TO LEN(word$(n))
                templtr = ASC(word$(n), i)
                FOR x = 1 TO lenin '<<<<<<<<<<<<<<<<< main fix #1
                    IF ltr(x) = templtr THEN
                        ltr(x) = 0
                        count = count + 1
                    END IF
                NEXT x
                IF count <> i GOTO skipmore
            NEXT i
 
            IF count = LEN(word$(n)) THEN
                PRINT word$(n),
            END IF
        END IF
 
        skipmore:
        FOR m = 1 TO 9 'main fix #2 n's  to m's
            ltr(m) = ASC(in$, m)
        NEXT m
        count = 0
    NEXT n
 
    PRINT "DONE..."
LOOP
PRINT USING "###,###,###,###,### loops ran in ##.# seconds"; loopsran, timelimit
 
'for debugging
FUNCTION letters$ ()
    FOR i = 1 TO 9
        b$ = b$ + ltr$(i)
    NEXT
    letters$ = "*" + b$ + "*"
END FUNCTION

Yours will run 22 times in 5 seconds, mine 324 times...

The changes?

#1) STRINGS are gone. Why do we need them? Especially, WHY DO WE NEED MID$??

When going for speed routines, ASC outperforms MID$(x$,y,1) every time, hands down! This is a major performance boost.

#2) The word list is limited to begin with.

IF LEN(line$) < 10 THEN
filecount = filecount + 1
word$(filecount) = line$
END IF

You're not going to find 10 letter word matches with only 9 letter words.

#3) (Not timed, but a huge speed boost): Changed file from INPUT to BINARY and changed INPUT # to LINE INPUT#. This makes loading the word list a whole heck of a lot faster. Even faster would be to load it all at once and then parse the words out of it, but who wants to go through the trouble for all of the few microseconds we'd save in this case?

Simple little things, but the affect the performance like crazy.

Title: Re: WordCracker
Post by: SMcNeill on October 07, 2018, 05:01:34 pm

And, if we create a letter index for our words (since we're going to do a limit by key$), we can almost double the performance. (Probably more for certain letters which aren't going to be indexed as much as "a" is, in this case.)

Code: QB64: [Select]

DEFLNG A-Z
DIM word$(84100)
DIM key$
DIM SHARED ltr(9) '<<<<<<<<<<< shared to debug
DIM backup(9)
DIM keycount(97 TO 122, 50000) 'letter/index
 
'SET KEY AND LETTERS
key$ = "a"
in$ = "abcdefghi"
lenin = LEN(in$) '<<<<<<<<<<< for faster processing
 
'SPLIT UP LETTERS
FOR n = 1 TO 9
    ltr(n) = ASC(in$, n)
    backup(n) = ltr(n)
NEXT n
 
 
 
 
 
'LOAD ALL WORDS FROM .TXT FILE
PRINT "LOADING...."
OPEN "WordList.txt" FOR BINARY AS #1 '<<<<<<<<< moved to more appropriate place
filecount = 0
WHILE NOT EOF(1)
    LINE INPUT #1, line$
    IF LEN(line$) < 10 THEN
        filecount = filecount + 1
        word$(filecount) = line$
        FOR i = 97 TO 122
            IF INSTR(line$, CHR$(i)) THEN
                keycount(i, 0) = keycount(i, 0) + 1 'record 0 tells us how many there are for that letter
                keycount(i, keycount(i, 0)) = filecount 'add the word number to the proper index
            END IF
        NEXT
    END IF
WEND
CLOSE #1
 
 
t# = TIMER
timelimit = 5
DO UNTIL TIMER > t# + timelimit
    CLS
    loopsran = loopsran + 1
 
 
    'RUN THROUGH WORDS
    findkey = ASC(key$)
    FOR n = 1 TO keycount(findkey, 0)
        w$ = word$(keycount(findkey, n))
        l = LEN(w$)
        FOR i = 1 TO l
            templtr = ASC(w$, i)
            FOR x = 1 TO lenin '<<<<<<<<<<<<<<<<< main fix #1
                IF ltr(x) = templtr THEN
                    ltr(x) = 0
                    count = count + 1
                END IF
            NEXT x
            IF count <> i GOTO skipmore
        NEXT i
 
        IF count = l THEN PRINT w$,
 
        skipmore:
        FOR m = 1 TO 9 'main fix #2 n's  to m's
            ltr(m) = backup(m)
        NEXT m
        count = 0
    NEXT n
 
    PRINT "DONE..."
LOOP
PRINT USING "###,###,###,###,### loops ran in ##.# seconds"; loopsran, timelimit
 
'for debugging
FUNCTION letters$ ()
    FOR i = 1 TO 9
        b$ = b$ + ltr$(i)
    NEXT
    letters$ = "*" + b$ + "*"
END FUNCTION
 

No need to check each and every word for a letter, if we already have built an index to tell us which words contain those letters. ;)

(Current run time is 488 loops in 5 seconds on my PC, if you're curious about the actual number to compare improvement.)

Title: Re: WordCracker
Post by: bplus on October 07, 2018, 05:01:54 pm

Here's what I had in mind:

Code: QB64: [Select]

' WordCrack mod 1.bas B+ 2018-10-07
 
INPUT "Enter a string to build words from "; in$
'load words
OPEN "WordList.txt" FOR BINARY AS #1
gulp& = LOF(1)
buff$ = STRING$(gulp&, " ")
GET #1, , buff$
CLOSE #1
REDIM word$(0)
Split buff$, CHR$(10), word$()
filecount& = UBOUND(word$)
'RUN THROUGH WORDS
FOR n& = 0 TO filecount&
    c$ = in$
    OK% = -1
    FOR i% = 1 TO LEN(word$(n&))
        p% = INSTR(c$, MID$(word$(n&), i%, 1))
        IF p% = 0 THEN
            OK% = 0: EXIT FOR
        ELSE
            MID$(c$, p%, 1) = "+"
        END IF
    NEXT
    IF OK% THEN PRINT word$(n&); ", ";
NEXT
PRINT "DONE..."
 
 
SUB Split (mystr AS STRING, delim AS STRING, arr() AS STRING)
    ' bplus modifications of Galleon fix of Bulrush Split reply #13
    ' http://www.[abandoned, outdated and now likely malicious qb64 dot net website - don’t go there]/forum/index.php?topic=1612.0
    ' this sub further developed and tested here: \test\Strings\Split test.bas
    DIM copy AS STRING, p AS LONG, curpos AS LONG, arrpos AS LONG, lc AS LONG, dpos AS LONG
    copy = mystr 'make copy since we are messing with mystr
    'special case if delim is space, probably want to remove all excess space
    IF delim = " " THEN
        copy = RTRIM$(LTRIM$(copy))
        p = INSTR(copy, "  ")
        WHILE p > 0
            copy = MID$(copy, 1, p - 1) + MID$(copy, p + 1)
            p = INSTR(copy, "  ")
        WEND
    END IF
    curpos = 1
    arrpos = 0
    lc = LEN(copy)
    dpos = INSTR(curpos, copy, delim)
    DO UNTIL dpos = 0
        arr(arrpos) = MID$(copy, curpos, dpos - curpos)
        arrpos = arrpos + 1
        REDIM _PRESERVE arr(arrpos + 1) AS STRING
        curpos = dpos + LEN(delim)
        dpos = INSTR(curpos, copy, delim)
    LOOP
    arr(arrpos) = MID$(copy, curpos)
    REDIM _PRESERVE arr(arrpos) AS STRING
END SUB
 

Looks pretty simple to me.

Yes, Binary load most significant speed improvement! :)

Title: Re: WordCracker
Post by: SMcNeill on October 07, 2018, 05:26:23 pm

And another small modification of the original gives us another decent speed boost:

Code: QB64: [Select]

DEFLNG A-Z
DIM word$(84100)
DIM keys
DIM SHARED ltr(1 TO 9) '<<<<<<<<<<< shared to debug
DIM backup(1 TO 9)
DIM keycount(97 TO 122, 50000) 'letter/index
DIM m1 AS _MEM, m2 AS _MEM 'just for quick array restoration
m1 = _MEM(ltr()): m2 = _MEM(backup())
 
'SET KEY AND LETTERS
keys = ASC("a")
in$ = "abcdefghi"
lenin = LEN(in$) '<<<<<<<<<<< for faster processing
 
'SPLIT UP LETTERS
FOR n = 1 TO 9
    ltr(n) = ASC(in$, n)
NEXT n
_MEMPUT m2, m2.OFFSET, ltr()
 
 
'LOAD ALL WORDS FROM .TXT FILE
PRINT "LOADING...."
OPEN "WordList.txt" FOR BINARY AS #1 '<<<<<<<<< moved to more appropriate place
filecount = 0
WHILE NOT EOF(1)
    LINE INPUT #1, line$
    IF LEN(line$) < 10 THEN
        filecount = filecount + 1
        word$(filecount) = line$
        FOR i = 97 TO 122
            IF INSTR(line$, CHR$(i)) THEN
                keycount(i, 0) = keycount(i, 0) + 1 'record 0 tells us how many there are for that letter
                keycount(i, keycount(i, 0)) = filecount 'add the word number to the proper index
            END IF
        NEXT
    END IF
WEND
CLOSE #1
 
 
t# = TIMER
timelimit = 5
 
 
DO UNTIL TIMER > t# + timelimit
    CLS
    loopsran = loopsran + 1
 
 
    'RUN THROUGH WORDS
    FOR n = 1 TO keycount(keys, 0)
        w$ = word$(keycount(keys, n))
        l = LEN(w$)
        FOR i = 1 TO l
            templtr = ASC(w$, i)
            FOR x = 1 TO lenin '<<<<<<<<<<<<<<<<< main fix #1
                IF ltr(x) = templtr THEN
                    ltr(x) = 0
                    count = count + 1
                    EXIT FOR
                END IF
            NEXT x
            IF count <> i GOTO skipmore
        NEXT i
 
        PRINT w$,
 
        skipmore:
        $CHECKING:OFF
        _MEMPUT m1, m1.OFFSET, backup()
        $CHECKING:ON
        count = 0
    NEXT n
 
    PRINT "DONE..."
LOOP
PRINT USING "###,###,###,###,### loops ran in ##.# seconds"; loopsran, timelimit
 
'for debugging
FUNCTION letters$ ()
    FOR i = 1 TO 9
        b$ = b$ + ltr$(i)
    NEXT
    letters$ = "*" + b$ + "*"
END FUNCTION
 

Instead of a FOR...LOOP to reset the search list, I simply restored it with a backup array with _MEM.

From:

FOR m = 1 TO 9 'main fix #2 n's to m's
ltr$(m) = MID$(in$, m, 1)
NEXT m

To:

$CHECKING:OFF
_MEMPUT m1, m1.OFFSET, backup()
$CHECKING:ON

*****************
*****************

I'm now getting ~900 runs in a short 5 second period. I'd call that fast enough for just about anything I'd need to use it for. (And a nice improvement from the original 22 runs in 5 seconds.) ;)

Title: Re: WordCracker
Post by: bplus on October 07, 2018, 05:40:28 pm

I am getting 363 cycles for abcdefghi in 5 secs with Windows 10 Intel Core i5 processor without any preprocessing of the word$() array.

Title: Re: WordCracker
Post by: bplus on October 07, 2018, 07:24:40 pm

Steve your last code improvement is running 1067 or so loops on my machine.

Yeah OK, checking word length first does make sense and adds another 100 loops in 5 secs:

Code: QB64: [Select]

' WordCrack mod 1.bas B+ 2018-10-07
 
' now with timer mod
 
'INPUT "Enter a string to build words from "; in$
in$ = "abcdefghi"
in$ = LCASE$(in$)
lenin = LEN(in$)
 
'load words
OPEN "WordList.txt" FOR BINARY AS #1
gulp& = LOF(1)
buff$ = STRING$(gulp&, " ")
GET #1, , buff$
CLOSE #1
REDIM word$(0)
Split buff$, CHR$(10), word$()
filecount& = UBOUND(word$)
start! = TIMER
WHILE TIMER - start! < 5
    'RUN THROUGH WORDS
    FOR n& = 0 TO filecount&
        IF LEN(word$(n&)) <= lenin THEN
            c$ = in$
            OK% = -1
            FOR i% = 1 TO LEN(word$(n&))
                p% = INSTR(c$, MID$(word$(n&), i%, 1))
                IF p% = 0 THEN
                    OK% = 0: EXIT FOR
                ELSE
                    MID$(c$, p%, 1) = "+"
                END IF
            NEXT
            IF OK% THEN PRINT word$(n&); ", ";
        END IF
    NEXT
    counter = counter + 1
WEND
PRINT "Loop count in 5 secs ="; counter
END
 
SUB Split (mystr AS STRING, delim AS STRING, arr() AS STRING)
    ' bplus modifications of Galleon fix of Bulrush Split reply #13
    ' http://www.[abandoned, outdated and now likely malicious qb64 dot net website - don’t go there]/forum/index.php?topic=1612.0
    ' this sub further developed and tested here: \test\Strings\Split test.bas
    DIM copy AS STRING, p AS LONG, curpos AS LONG, arrpos AS LONG, lc AS LONG, dpos AS LONG
    copy = mystr 'make copy since we are messing with mystr
    'special case if delim is space, probably want to remove all excess space
    IF delim = " " THEN
        copy = RTRIM$(LTRIM$(copy))
        p = INSTR(copy, "  ")
        WHILE p > 0
            copy = MID$(copy, 1, p - 1) + MID$(copy, p + 1)
            p = INSTR(copy, "  ")
        WEND
    END IF
    curpos = 1
    arrpos = 0
    lc = LEN(copy)
    dpos = INSTR(curpos, copy, delim)
    DO UNTIL dpos = 0
        arr(arrpos) = MID$(copy, curpos, dpos - curpos)
        arrpos = arrpos + 1
        REDIM _PRESERVE arr(arrpos + 1) AS STRING
        curpos = dpos + LEN(delim)
        dpos = INSTR(curpos, copy, delim)
    LOOP
    arr(arrpos) = MID$(copy, curpos)
    REDIM _PRESERVE arr(arrpos) AS STRING
END SUB
 

Title: Re: WordCracker
Post by: codeguy on October 08, 2018, 12:23:37 am

generates strings and retrieves matches for permutations of letters creating words up to 8 characters long. in order too. On my humble machine, (39/64)s. Because permutations are generated in lexical order AND the wordlist is in lexical order (both ascending), this becomes nothing more than a simple merge/find algorithm, the same used for batch database updates. Quick? Judge for yourself. found 58 matches in 322560 variable-length permutations. I am uncertain how many loops/second this code executes, but I'm gonna say it's 322560/(39/64), roughly 529329 trials/second on my humble 2.16GHz machine. On my machine, this will also execute about 8 times, perhaps a wee bit more. I am uncertain how fast Steve's CPU is, but mine is no speed demon nor is it super slow. Mine is roughly 240,000 trials/GHz.

Code: QB64: [Select]

WIDTH 80, 43
DIM parray(0 TO 7) AS LONG '* checks words up to 8 characters long
theword$ = "sequoias"
FOR i = 0 TO UBOUND(parray)
    parray(i) = ASC(theword$, i + 1)
NEXT
s& = LBOUND(parray)
h& = UBOUND(parray)
DO
    an& = s& + 1
    FOR q& = an& TO h&
        IF parray(q&) < parray(s&) THEN
            SWAP parray(q&), parray(s&)
        END IF
    NEXT
    s& = an&
LOOP WHILE s& <= h&
PRINT
PRINT
Wlist% = FREEFILE
OPEN ".\wordlist.txt" FOR BINARY AS Wlist%
PRINT LOF(Wlist%)
chunk$ = INPUT$(LOF(Wlist%), Wlist%)
CLOSE Wlist%
REDIM words$(0 TO 999999)
Wct& = 0
FOR u& = 1 TO LEN(chunk$)
    IF ASC(chunk$, u&) = 10 OR u& > LEN(chunk$) THEN
        IF LEN(w$) <= 8 THEN
            words$(Wct&) = w$
            PRINT w$; Wct&
            Wct& = Wct& + 1
        END IF
        w$ = ""
    ELSE
        w$ = w$ + MID$(chunk$, u&, 1)
    END IF
NEXT
chunk$ = ""
WordsIndex& = 0
nmatches& = 0
'_DELAY 60
t! = TIMER(.001)
PRINT "permutations"
Permute parray(), 0, UBOUND(parray), np#, words$(), WordIndex&, Wct&, nmatches&, npermtrials#
f! = TIMER(.001)
PRINT f! - t!; nmatches&; np#; npermtrials#
DO
    x$ = INKEY$
LOOP UNTIL x$ > ""
SYSTEM
 
SUB DisplayResults (PArray() AS LONG, start, finish, np AS DOUBLE)
    DIM i AS LONG
    PRINT USING "#,###,###,###,###"; np;
    FOR i = LBOUND(parray) TO UBOUND(parray)
        PRINT PArray(i);
    NEXT
    PRINT
END SUB
 
SUB Rotate (parray() AS LONG, Start AS LONG, finish AS LONG)
    DIM ts AS LONG
    ts = parray(Start)
    FOR i = Start TO finish - 1
        SWAP parray(i), parray(i + 1)
    NEXT
    parray(finish) = ts
END SUB
 
SUB Permute (parray() AS LONG, start AS LONG, finish AS LONG, np AS DOUBLE, words$(), index&, wct&, matchcount&, NPermsTried#)
    np = np + 1
    IF index& < wct& THEN
        FOR a& = 0 TO UBOUND(parray)
            v$ = array2word$(parray(), LBOUND(parray), LBOUND(parray) + a&)
            DO
                IF words$(index&) < v$ THEN
                    index& = index& + 1
                ELSE
                    NPermsTried# = NPermsTried# + 1
                    IF words$(index&) = v$ THEN
                        matchcount& = matchcount& + 1
                        PRINT v$; ":match:"; words$(index&); index&; matchcount&
                    END IF
                    EXIT DO
                END IF
            LOOP
        NEXT
    ELSE
        EXIT SUB
    END IF
    IF start < finish THEN
        DIM i AS LONG
        DIM j AS LONG
        FOR i = finish - 1 TO start STEP -1
            FOR j = i + 1 TO finish
                SWAP parray(i), parray(j)
                Permute parray(), i + 1, finish, np, words$(), index&, wct&, matchcount&, NPermsTried#
            NEXT
            Rotate parray(), i, finish
        NEXT
    END IF
END SUB
 
FUNCTION array2word$ (parray() AS LONG, start AS LONG, finish AS LONG)
    m$ = SPACE$(finish - start + 1)
    POSITION& = 1
    FOR z& = start TO finish
        MID$(m$, POSITION&) = CHR$(parray(z&))
        POSITION& = POSITION& + 1
    NEXT
    array2word$ = m$
END FUNCTION
 

Title: Re: WordCracker
Post by: Zeppelin on October 08, 2018, 01:11:38 am

Wow. Thanks everyone for their responses. Thanks for the performance tips (It really helps. I still learning).
Its nice to see everyone trying to out-do each others programs.

I have to say SMcNeill..... what do you do with your life.

Just joking man. Unbelievably fast program by the way.

Thanks everyone,
Zeppelin

Title: Re: WordCracker
Post by: bplus on October 08, 2018, 10:10:05 am

Made some more improvements like getting rid of blank word and preprocessed file according to number of letters input to build words from. "abcdefghi" now runs around 518 loops in 5 secs and finds 351 words from "Steve McNeil" input. Steve's code crashes if "Steve McNeil" is input. ;-))

Code: QB64: [Select]

_TITLE "WordCrack mod 2 with preprocessing.bas B+ 2018-10-08"
 
' now with timer mod and preprocessing of word file to length of in$
 
'INPUT "Enter a string to build words from "; inp$
inp$ = "Steve McNeil"
in$ = LCASE$(inp$)
lenin = LEN(in$)
 
'load words
OPEN "WordList.txt" FOR BINARY AS #1
gulp& = LOF(1)
buff$ = STRING$(gulp&, " ")
GET #1, , buff$
CLOSE #1
REDIM longWords$(0)
Split buff$, CHR$(10), longWords$()
 
'preprocess word list, use only words <= the build words string
DIM word$(84100)
FOR i& = 0 TO UBOUND(longWords$)
    IF LEN(longWords$(i&)) <= lenin THEN
        IF LTRIM$(longWords$(i&)) <> "" THEN
            IF ASC(longWords$(i&)) > 96 AND ASC(longWords$(i&)) < 123 THEN
                wi& = wi& + 1
                word$(wi&) = longWords$(i&)
            END IF
        END IF
    END IF
NEXT
 
'NOW do the loop count, should add lots of loops checking one less thing per loop and looping less times
start! = TIMER
WHILE TIMER - start! < 5
    'RUN THROUGH WORDS
    foundWords% = 0
    FOR n& = 0 TO wi&
        c$ = in$
        OK% = -1
        FOR i% = 1 TO LEN(word$(n&))
            p% = INSTR(c$, MID$(word$(n&), i%, 1))
            IF p% = 0 THEN
                OK% = 0: EXIT FOR
            ELSE
                MID$(c$, p%, 1) = "+"
            END IF
        NEXT
        IF OK% THEN foundWords% = foundWords% + 1: PRINT word$(n&); ", ";
    NEXT
    counter% = counter% + 1
WEND
PRINT: PRINT: PRINT "Found:"; foundWords%; "words in "; CHR$(34); inp$; CHR$(34); ","; counter%; "times in 5 secs."
END
 
SUB Split (mystr AS STRING, delim AS STRING, arr() AS STRING)
    ' bplus modifications of Galleon fix of Bulrush Split reply #13
    ' http://www.[abandoned, outdated and now likely malicious qb64 dot net website - don’t go there]/forum/index.php?topic=1612.0
    ' this sub further developed and tested here: \test\Strings\Split test.bas
    DIM copy AS STRING, p AS LONG, curpos AS LONG, arrpos AS LONG, lc AS LONG, dpos AS LONG
    copy = mystr 'make copy since we are messing with mystr
    'special case if delim is space, probably want to remove all excess space
    IF delim = " " THEN
        copy = RTRIM$(LTRIM$(copy))
        p = INSTR(copy, "  ")
        WHILE p > 0
            copy = MID$(copy, 1, p - 1) + MID$(copy, p + 1)
            p = INSTR(copy, "  ")
        WEND
    END IF
    curpos = 1
    arrpos = 0
    lc = LEN(copy)
    dpos = INSTR(curpos, copy, delim)
    DO UNTIL dpos = 0
        arr(arrpos) = MID$(copy, curpos, dpos - curpos)
        arrpos = arrpos + 1
        REDIM _PRESERVE arr(arrpos + 1) AS STRING
        curpos = dpos + LEN(delim)
        dpos = INSTR(curpos, copy, delim)
    LOOP
    arr(arrpos) = MID$(copy, curpos)
    REDIM _PRESERVE arr(arrpos) AS STRING
END SUB
 

PS codeguy's code is going to crash also if only permutations up to 8 letters is allowed, without even checking I can see that.

Less than 4 hours to go. Will Steve fix his code in time? Stay tuned... ;)

Title: Re: WordCracker
Post by: bplus on October 08, 2018, 10:23:45 am

Steve wins again! My name only has 271 words in it. :) (but none are vile)

Title: Re: WordCracker
Post by: SMcNeill on October 08, 2018, 11:27:46 am

Modified to run with any size string (even "Steve McNeil"; which, btw, is named WRONG. /cry!!):

Code: QB64: [Select]

WIDTH 80, 50
DEFLNG A-Z
'SET KEY AND LETTERS
keys = 0 ' ASC("S") OR &B00100000 '0 for no key limiter, otherwise use the ASC("letter") OR &B00100000 to force lowercase value
in$ = LCASE$("SteveMcNeil")
lenin = LEN(in$) '<<<<<<<<<<< for faster processing
 
 
DIM word$(84100)
DIM SHARED ltr(1 TO lenin) '<<<<<<<<<<< shared to debug
DIM backup(1 TO lenin)
DIM keycount(97 TO 122, 60000) 'letter/index
DIM m1 AS _MEM, m2 AS _MEM 'just for quick array restoration
m1 = _MEM(ltr()): m2 = _MEM(backup())
 
 
'SPLIT UP LETTERS
FOR n = 1 TO lenin
    ltr(n) = ASC(in$, n) OR &B00100000
NEXT n
_MEMPUT m2, m2.OFFSET, ltr()
 
 
'LOAD ALL WORDS FROM .TXT FILE
PRINT "LOADING...."
OPEN "WordList.txt" FOR BINARY AS #1 '<<<<<<<<< moved to more appropriate place
filecount = 0
WHILE NOT EOF(1)
    LINE INPUT #1, line$
    IF LEN(line$) <= lenin THEN
        filecount = filecount + 1
        word$(filecount) = line$
        FOR i = 97 TO 122
            IF INSTR(line$, CHR$(i)) THEN
                keycount(i, 0) = keycount(i, 0) + 1 'record 0 tells us how many there are for that letter
                keycount(i, keycount(i, 0)) = filecount 'add the word number to the proper index
            END IF
        NEXT
    END IF
WEND
CLOSE #1
 
 
t# = TIMER
timelimit = 5
 
 
DO UNTIL TIMER > t# + timelimit
    CLS
    loopsran = loopsran + 1
    totalcount = 0
 
 
    'RUN THROUGH WORDS
    IF keys = 0 THEN k = filecount ELSE k = keycount(keys, 0)
    FOR n = 1 TO k
        w$ = word$(n)
        FOR i = 1 TO LEN(w$)
            templtr = ASC(w$, i)
            FOR x = 1 TO lenin
                IF ltr(x) = templtr THEN ltr(x) = 0: GOTO stillvalid
            NEXT x
            GOTO skipmore
            stillvalid:
        NEXT i
 
        totalcount = totalcount + 1
        PRINT w$,
 
        skipmore:
        $CHECKING:OFF
        _MEMPUT m1, m1.OFFSET, backup()
        $CHECKING:ON
        count = 0
    NEXT n
 
    PRINT "DONE..."
LOOP
PRINT USING "###,###,###,###,### loops ran in ##.# seconds"; loopsran, timelimit
PRINT totalcount; "matches found for "; in$

About 300 loops in 5 seconds with the longer name giving us more letters to check against. Oddly enough, my list is only counting 350 words for us, and not 351 as Bplus says his is generating. Is this a glitch in my counting method? His? Or is somebody leaving a word out, or including a false positive?

I dunno!

Some digging will be required to see where that extra word comes from, and what the heck it is! :P

Title: Re: WordCracker
Post by: SMcNeill on October 08, 2018, 11:41:33 am

Testing shows that Bplus is wrong. :D

Compare the 2 file dumps (the first is mine, the second is Bplus's), and it's immediately obvious what the issue is, right at the very top of the file -- a BLANK word. There's 350 words, with an extra "" at the top of his list.

Steve wins again! ;D

Title: Re: WordCracker
Post by: bplus on October 08, 2018, 12:09:55 pm

Cra...

Sorry about your name Steve, my eyes are going bad.

Wait, , isn't a word? ;-)) (where the heck did that come from?)

Thanks for heads up!

Title: Re: WordCracker
Post by: SMcNeill on October 08, 2018, 12:44:28 pm

Quote from: bplus on October 08, 2018, 12:09:55 pm

Cra...

Sorry about your name Steve, my eyes are going bad.

No worries; it's the story of my life... My family has lived here since forever, so when it was time to name the roads, the powers that be named the road here after my family...

And, spelt it wrong!

No kidding!!

You can find me on McNeil Hill Rd, still laughing at the irony of being immortalized incorrectly.

*****************

And funniest thing??

The local government says we can change the name, as long as WE are willing to pay the $$$$$ to change the road signs.

Two little signs, only needing an extra "l" in them, and yet to comply to state standards, they cost over $2600 each!

It's no damn wonder our government is broke, trying to hit such costs just to stay within regulation. On a low traveled, rural as heck road like mine, a $2.00 slab of wood, $1 can of white paint, and $1 can of black paint for lettering, would be more than sufficient to serve the needs of the community...

And yet... The sign has to be a certain gauge steel, painted X coats of reflective green, with Y coats of Z-size white paint, mixed with XX percent reflective beads....

(And, just in case you want to see how nutty regulations are for government stuff, here's the regs on signs: http://www.vdot.virginia.gov/business/resources/TED/final_MUTCD/2013_sup/Revision_1_Part_2_Signs.pdf --- Well, it's the BASIC regulations. There are 14 supplements, with 5 supplements to supplement the supplements also...l)

Title: Re: WordCracker
Post by: Cobalt on October 08, 2018, 01:04:32 pm

Quote from: SMcNeill on October 08, 2018, 12:44:28 pm

And funniest thing??

The local government says we can change the name, as long as WE are willing to pay the $$$$$ to change the road signs.

Two little signs, only needing an extra "l" in them, and yet to comply to state standards, they cost over $2600 each!

It's no damn wonder our government is broke, trying to hit such costs just to stay within regulation. On a low traveled, rural as heck road like mine, a $2.00 slab of wood, $1 can of white paint, and $1 can of black paint for lettering, would be more than sufficient to serve the needs of the community...

And yet... The sign has to be a certain gauge steel, painted X coats of reflective green, with Y coats of Z-size white paint, mixed with XX percent reflective beads....

(And, just in case you want to see how nutty regulations are for government stuff, here's the regs on signs: http://www.vdot.virginia.gov/business/resources/TED/final_MUTCD/2013_sup/Revision_1_Part_2_Signs.pdf --- Well, it's the BASIC regulations. There are 14 supplements, with 5 supplements to supplement the supplements also...l)

just do what the kids do, take some paint and add your own extra 'l', find some spray paint thats lying around it costs you nothing! unless you get caught and fined. or go the extra mile with some masking tape so it looks better!
175 pages, does that meet novel requirements too?

Title: Re: WordCracker
Post by: bplus on October 08, 2018, 03:29:34 pm

Yep, for shorter strings Steve's loop count is a winner!

But for longer strings this current version is a winner.

Code: QB64: [Select]

WIDTH 80, 50
_TITLE "WordCrack mod 2 with preprocessing.bas B+ 2018-10-08"
 
' now with timer mod and preprocessing of word file to length of in$
 
' fix the extra "" word being printed   12:28 PM
' use same print method as Steve's code, change type for wordsFound
 
'INPUT "Enter a string to build words from "; inp$
inp$ = "thequickbrownfoxjumpedoverthelazydog"
'inp$ = "thequickbrownfoxjumpedoverthelazydogdreamingofcountingsheep"
'inp$ = "thequickbrownfoxjumpedoverthelazydogdreamingofcountingsheepthinkingofbrownfoxjumpingoverdogs"
in$ = LCASE$(inp$)
lenin = LEN(in$)
 
'load words
OPEN "WordList.txt" FOR BINARY AS #1
gulp& = LOF(1)
buff$ = STRING$(gulp&, " ")
GET #1, , buff$
CLOSE #1
REDIM longWords$(0)
Split buff$, CHR$(10), longWords$()
 
'preprocess word list, use only words <= the build words string
DIM word$(84100)
FOR i& = 0 TO UBOUND(longWords$)
    IF LEN(longWords$(i&)) <= lenin THEN
        IF LTRIM$(longWords$(i&)) <> "" THEN
            IF ASC(longWords$(i&)) > 96 AND ASC(longWords$(i&)) < 123 THEN
                wi& = wi& + 1
                word$(wi&) = longWords$(i&)
            END IF
        END IF
    END IF
NEXT
 
'NOW do the loop count, should add lots of loops checking one less thing per loop and looping less times
start! = TIMER
WHILE TIMER - start! < 5
    'RUN THROUGH WORDS
    foundWords& = 0
    FOR n& = 1 TO wi& '<<<<<<<<<<<<<<<<<<<  fixed from 0 to 1
        c$ = in$
        OK% = -1
        FOR i% = 1 TO LEN(word$(n&))
            p% = INSTR(c$, MID$(word$(n&), i%, 1))
            IF p% = 0 THEN
                OK% = 0: EXIT FOR
            ELSE
                MID$(c$, p%, 1) = "+"
            END IF
        NEXT
        IF OK% THEN foundWords& = foundWords& + 1: PRINT word$(n&),
    NEXT
    counter& = counter& + 1
WEND
PRINT: PRINT: PRINT "Found:"; foundWords&; "words in "; CHR$(34); inp$; CHR$(34); ","; counter&; "times in 5 secs."
END
 
SUB Split (mystr AS STRING, delim AS STRING, arr() AS STRING)
    ' bplus modifications of Galleon fix of Bulrush Split reply #13
    ' http://www.[abandoned, outdated and now likely malicious qb64 dot net website - don’t go there]/forum/index.php?topic=1612.0
    ' this sub further developed and tested here: \test\Strings\Split test.bas
    DIM copy AS STRING, p AS LONG, curpos AS LONG, arrpos AS LONG, lc AS LONG, dpos AS LONG
    copy = mystr 'make copy since we are messing with mystr
    'special case if delim is space, probably want to remove all excess space
    IF delim = " " THEN
        copy = RTRIM$(LTRIM$(copy))
        p = INSTR(copy, "  ")
        WHILE p > 0
            copy = MID$(copy, 1, p - 1) + MID$(copy, p + 1)
            p = INSTR(copy, "  ")
        WEND
    END IF
    curpos = 1
    arrpos = 0
    lc = LEN(copy)
    dpos = INSTR(curpos, copy, delim)
    DO UNTIL dpos = 0
        arr(arrpos) = MID$(copy, curpos, dpos - curpos)
        arrpos = arrpos + 1
        REDIM _PRESERVE arr(arrpos + 1) AS STRING
        curpos = dpos + LEN(delim)
        dpos = INSTR(curpos, copy, delim)
    LOOP
    arr(arrpos) = MID$(copy, curpos)
    REDIM _PRESERVE arr(arrpos) AS STRING
END SUB
 
 

inp$ = "thequickbrownfoxjumpedoverthelazydog" > 74 loops versus 65 Steve's
'inp$ = "thequickbrownfoxjumpedoverthelazydogdreamingofcountingsheep" > 41 versus 36-37 Steve's
'inp$ = "thequickbrownfoxjumpedoverthelazydogdreamingofcountingsheepthinkingofbrownfoxjumpingoverdogs" > 37 versus 32-33 Steve's

Title: Re: WordCracker
Post by: SMcNeill on October 08, 2018, 03:40:47 pm

Quote

Yep, for shorter strings Steve's loop count is a winner!

But for longer strings this current version is a winner.

When dealing with such long strings, I'd definitely go a different route. Read letters. Preset an array. Just compare letter counts. I imagine it'd be a ton faster than how we're currently doing it, but that's because the program needs would be much different than originally stated.

I'll play around with it some and see just how quick a routine I can whip up for those longer words/phrases. :)

Title: Re: WordCracker
Post by: codeguy on October 08, 2018, 03:58:27 pm

I got 361 words matching stevemcneill using the wordlist.txt as included in an attachment for this post. Somewhere, someone is wrong. Mine was modified to accommodate the added characters. Yes, it took a while as I did no optimizations for eliminating impossible prefixes and suffixes.

Title: Re: WordCracker
Post by: bplus on October 08, 2018, 04:22:34 pm

Hi codeguy,

For stevemcneill, 2 l's on end, I am getting 392 words with Steve's code and my version.

I don't know about just a letter? is t a word? I know I is. ;)

Title: Re: WordCracker
Post by: SMcNeill on October 08, 2018, 04:27:12 pm

Quote from: bplus on October 08, 2018, 04:22:34 pm

Hi codeguy,

For stevemcneill, 2 l's on end, I am getting 392 words with Steve's code and my version.

I don't know about just a letter? is t a word? I know I is. ;)

Large dictionaries list each letter as a single-letter word. Each such word is defined as a noun, denoting the letter with which it is spelled.

"Psychology" starts with a p.

*********************

What if Q from Star Trek took the L train to the T intersection of C and D streets because he had a map where an X marked that spot? Would he get an A for following directions?

Title: Re: WordCracker
Post by: codeguy on October 08, 2018, 04:41:19 pm

You ARE using the original wordlist.txt from the original thread post https://www.qb64.org/forum/index.php?action=dlattach;topic=679.0;attach=1619 (https://www.qb64.org/forum/index.php?action=dlattach;topic=679.0;attach=1619), right? I got 361 unique matches using "stevemcneill" as the original string on my modified code, which was considerably slower BUT it does EXHAUSTIVE permutations of every letter, leading to quite large numbers, in fact the size of the job increases as a factorial.

Title: Re: WordCracker
Post by: bplus on October 08, 2018, 04:52:37 pm

Hi codeguy,

Is your code geared to handle the triple e and double l in stevemcneill? This is why I thought "nah!" for permutations.

Hi Steve,

You sure know how to drive home a point, N, S, E and W! ;)

Title: Re: WordCracker
Post by: codeguy on October 08, 2018, 05:23:50 pm

No, mine does unconditional permutations. Which is why it may have detected 3 more words than other submissions. But it does generate matches in lexical (alphabetical) order, saving sorting. For 8 characters or less, the speed is VERY similar to Steve's algorithm, running 8+ times to completion in the 5 second limit. Mine in 8 character mode, runs about .60s or less using the complete wordlist.txt document.

Title: Re: WordCracker
Post by: SMcNeill on October 08, 2018, 06:50:09 pm

My entry into the field of long-arse string searches:

Code: QB64: [Select]

SCREEN _NEWIMAGE(800, 600, 32)
CONST In$ = "thequickbrownfoxjumpedoverthelazydogdreamingofcountingsheepthinkingofbrownfoxjumpingoverdogs"
 
InLen = LEN(in$)
 
 
OPEN "WordList.txt" FOR BINARY AS #1
DO UNTIL EOF(1)
    LINE INPUT #1, junk$
    WordCount = WordCount + 1
    IF LEN(junk$) > maxlength THEN maxlength = LEN(junk$)
LOOP
SEEK 1, 1
 
TYPE WordLetters
    letter AS _UNSIGNED _BYTE
    count AS _UNSIGNED _BYTE
END TYPE
DIM Words(WordCount) AS STRING, Match(WordCount) AS LONG
DIM WordLetters(WordCount, 26) AS WordLetters
DIM Letters(97 TO 122) AS _UNSIGNED _BYTE
 
FOR i = 1 TO WordCount 'Load and prepare our word list and wordsearch array
    LINE INPUT #1, Words(i)
    FOR j = 1 TO LEN(Words(i))
        a = ASC(Words(i), j)
        IF a > 96 AND a < 123 THEN
            afound = 0
            FOR k = 1 TO WordLetters(i, 0).count
                IF a = WordLetters(i, k).letter THEN afound = k: EXIT FOR
            NEXT
            IF afound THEN
                WordLetters(i, afound).count = WordLetters(i, afound).count + 1 'add to the count of an existing letter
            ELSE
                WordLetters(i, 0).count = WordLetters(i, 0).count + 1 'increase the total count of letters
                w = WordLetters(i, 0).count
                WordLetters(i, w).letter = a 'save the letter
                WordLetters(i, w).count = 1 'and count it 1 for the first time it appeared
            END IF
        ELSE
            WordLetters(i, 0).count = 0 'invalidate words with non A-Z letters
            EXIT FOR
        END IF
    NEXT
NEXT
 
FOR i = 1 TO InLen 'get the letters for the search string
    a = (ASC(in$, i) OR 32)
    Letters(a) = Letters(a) + 1
NEXT
 
 
t# = TIMER
DO UNTIL TIMER > t# + 5
    loopcount = loopcount + 1
    wordfound = 0 'reset our counter
 
    'THE MAIN SEARCH ROUTINE HERE
 
    FOR i = 1 TO WordCount 'now check every word for a match
        sl = WordLetters(i, 0).count 'search limit
        IF sl = 0 THEN GOTO invalid
        FOR j = 1 TO sl
            l = WordLetters(i, j).letter 'the letter in the word
            IF Letters(l) < WordLetters(i, j).count THEN GOTO invalid 'it's impossible
        NEXT
        wordfound = wordfound + 1
        Match(wordfound) = i
        invalid:
    NEXT
 
    'END OF THE MAIN SEARCH ROUTINE
 
LOOP
 
'Print Results
FOR i = 1 TO wordfound
    PRINT Words(Match(i));
    IF i < wordfound THEN PRINT ","; ELSE PRINT
NEXT
PRINT
PRINT "DONE, with"; wordfound; "matches, with a speed of"; loopcount; "runs in 5 seconds."
 
END
 
'And, if you want to see the secret behind this method, here's how we store and retrieve our data:
 
FOR i = 1 TO 10 'I think just showing how we track 10 words should be fine enough
    PRINT Words(i),
    FOR j = 1 TO WordLetters(i, 0).count
        PRINT CHR$(WordLetters(i, j).letter);
        PRINT WordLetters(i, j).count;
        PRINT ",";
    NEXT
    PRINT
NEXT
 
SLEEP
 
'and here's the in$ and how its letter count holds up
 
 
PRINT in$
FOR i = 97 TO 122
    PRINT Letters(i);
NEXT
PRINT
 

Instead of 32-37 runs in 5 seconds, this does about 150 on my machine.

If you're curious how it does it, remove the END statement and then let it run. It'll show you how we basically break down our words so we never have to check the letters in them any more times than is absolutely necessary to make a match. ;)

Title: Re: WordCracker
Post by: SMcNeill on October 08, 2018, 07:30:11 pm

And, a slight tweak to do some pre-sorting based on In$ size and non-A-Z characters, and to add a key$ to make certain that the words all contain a valid letter we designate, as per the original post's requirements:

Code: QB64: [Select]

SCREEN _NEWIMAGE(800, 600, 32)
CONST In$ = "thequickbrownfoxjumpedoverthelazydogdreamingofcountingsheepthinkingofbrownfoxjumpingoverdogs"
'CONST In$ = "abcdefghi"
'key$ = "a" 'make "" or remark this line out completely to do a search without a required key, as per the original post
InLen = LEN(In$)
 
 
OPEN "WordList.txt" FOR BINARY AS #1
DO UNTIL EOF(1)
    LINE INPUT #1, junk$
    WordCount = WordCount + 1
    IF LEN(junk$) > maxlength THEN maxlength = LEN(junk$)
LOOP
SEEK 1, 1
 
TYPE WordLetters
    letter AS _UNSIGNED _BYTE
    count AS _UNSIGNED _BYTE
END TYPE
DIM Words(WordCount) AS STRING, Match(WordCount) AS LONG
DIM WordLetters(WordCount, 17) AS WordLetters
DIM Letters(97 TO 122) AS _UNSIGNED _BYTE
 
 
FOR i = 1 TO WordCount
    wc = wc + 1
    LINE INPUT #1, Words(wc)
    IF LEN(Words(wc)) > InLen THEN 'remove words too long to fit
        wc = wc - 1
    ELSE
        FOR j = 1 TO LEN(Words(wc)) 'remove non A-Z words as we're only searching for words with those letters
            IF ASC(Words(wc), j) < 97 OR ASC(Words(wc), j) > 122 THEN wc = wc - 1: EXIT FOR
        NEXT
    END IF
NEXT
WordCount = wc
 
FOR i = 1 TO WordCount 'Load and prepare our word list and wordsearch array
    FOR j = 1 TO LEN(Words(i))
        a = ASC(Words(i), j)
        afound = 0
        FOR k = 1 TO WordLetters(i, 0).count
            IF a = WordLetters(i, k).letter THEN afound = k: EXIT FOR
        NEXT
        IF afound THEN
            WordLetters(i, afound).count = WordLetters(i, afound).count + 1 'add to the count of an existing letter
        ELSE
            WordLetters(i, 0).count = WordLetters(i, 0).count + 1 'increase the total count of letters
            w = WordLetters(i, 0).count
            WordLetters(i, w).letter = a 'save the letter
            WordLetters(i, w).count = 1 'and count it 1 for the first time it appeared
        END IF
    NEXT
NEXT
 
FOR i = 1 TO InLen 'get the letters for the search string
    a = (ASC(In$, i) OR 32)
    Letters(a) = Letters(a) + 1
NEXT
 
 
t# = TIMER
DO UNTIL TIMER > t# + 5
    loopcount = loopcount + 1
    wordfound = 0 'reset our counter
 
    'THE MAIN SEARCH ROUTINE HERE
 
    FOR i = 1 TO WordCount 'now check every word for a match
        sl = WordLetters(i, 0).count 'search limit
        FOR j = 1 TO sl
            l = WordLetters(i, j).letter 'the letter in the word
            IF Letters(l) < WordLetters(i, j).count THEN GOTO invalid 'it's impossible
        NEXT
        IF key$ = "" THEN
            wordfound = wordfound + 1
            Match(wordfound) = i
        ELSEIF INSTR(Words(i), key$) <> 0 THEN
            wordfound = wordfound + 1
            Match(wordfound) = i
        END IF
        invalid:
    NEXT
 
    'END OF THE MAIN SEARCH ROUTINE
 
LOOP
 
 
'Print Results
FOR i = 1 TO wordfound
    PRINT Words(Match(i));
    IF i < wordfound THEN PRINT ","; ELSE PRINT
NEXT
PRINT
PRINT "DONE, with"; wordfound; "matches, with a speed of"; loopcount; "runs in 5 seconds."
 
SLEEP
CLS
 
 
 
 
'And, if you want to see the secret behind this method, here's how we store and retrieve our data:
 
FOR i = 1 TO 10 'I think just showing how we track 10 words should be fine enough
    PRINT Words(i),
    FOR j = 1 TO WordLetters(i, 0).count
        PRINT CHR$(WordLetters(i, j).letter);
        PRINT WordLetters(i, j).count;
        PRINT ",",
    NEXT
    PRINT
NEXT
 
SLEEP
 
'and here's the in$ and how its letter count holds up
 
 
PRINT In$
FOR i = 97 TO 122
    PRINT CHR$(i); LTRIM$(RTRIM$(STR$(Letters(i))));
    IF i < 122 THEN PRINT ", "; ELSE PRINT
NEXT
PRINT

We now run nice and fast for both short and long strings. We can use a preset qualifier key if we want to, but we don't have to. Run times are ~900 loops for "abcdefghi" with a key of "a", and ~200 times with the long entry Bplus used in his test routine earlier.

Title: Re: WordCracker
Post by: bplus on October 08, 2018, 07:34:55 pm

Yah Steve, 140 loops on my machine fabulous. Poor Peter, you robbed him to pay Paul. ;-))
(Steve, this comment based on your previous code post, you posted another while I was getting list ready for codeguy.)

Hi code guy,

Here is my list of 392 words from stevemcneill from the given WordList.txt file posted earlier in thread.
Hope this helps you find differences in lists.

PS what is unconditional permutation?

Title: Re: WordCracker
Post by: bplus on October 08, 2018, 07:59:49 pm

Some time ago, I built a Word Search Solver and had thought about making a Word Search Puzzle Builder.

With this Word Crack code review, I am rethinking building a Word Search Puzzle Builder based on a longish quote. Hmm... a few more details to work out... but have good word list generator now and if you throw in allot of extra of the same letters, could be a quite a challenge, like getting a puzzle where all the pieces look the same.

Title: Re: WordCracker
Post by: SMcNeill on October 08, 2018, 08:19:52 pm

And for those who are having trouble understanding how my last 2 examples work, the secret is in the letter counter.

Take APPLE as an example... Instead of searching to see if the In$ has those 5 letters, we instead count the letters first, when we load the dictionary. 1 A, 2 P, 1 L, 1 E... All we have to check is a maximum of 4 times, instead of 5; we just check to make certain In$ has 2 Ps in it once, rather than checking twice...

In some cases, this reduces a ton of letter checks. MISSISSIPPI. Instead of 11 letters, we check 4... (1 M, 4 I, 4 S, 2 P).

Minimal conditional checking makes for maximum performance. ;)

Title: Re: WordCracker
Post by: bplus on October 08, 2018, 08:32:47 pm

Quote from: SMcNeill on October 08, 2018, 08:19:52 pm

And for those who are having trouble understanding how my last 2 examples work, the secret is in the letter counter.

Take APPLE as an example... Instead of searching to see if the In$ has those 5 letters, we instead count the letters first, when we load the dictionary. 1 A, 2 P, 1 L, 1 E... All we have to check is a maximum of 4 times, instead of 5; we just check to make certain In$ has 2 Ps in it once, rather than checking twice...

In some cases, this reduces a ton of letter checks. MISSISSIPPI. Instead of 11 letters, we check 4... (1 M, 4 I, 4 S, 2 P).

Minimal conditional checking makes for maximum performance. ;)

PLUS, if you want to save even more time, save the processed Word List back into a new Data File, once and forever preprocessed!

Peter smiles, he ain't feel'in so broke no more.

Title: Re: WordCracker
Post by: codeguy on October 08, 2018, 08:49:01 pm

Unconditional permutation means I perform no checking as to whether the next generated permutation actually fits the pattern of the language. In essence, it will check for permutations of words beginning with rx or some other beginning consonant pair that does not appear in standard language. This is handy for finding abbreviations like RPM or GPS or even TLDR, which are not words in the standard sense, but is included in the English language as shorthand for Revolutions Per Minute Global Positioning System and Too Late Didn't READ. Even stuff like RTFM, whose translation I will leave to the reader. While my method is slow for very large strings, it is absolutely thorough and can work for languages that are not English too. Slow? Yes, it is slow, but it will not skip any possible permutations that could lead to missing words in a word list. This is why it's considered an exhaustive algorithm. Also, my method eliminates any chance for repetitions of words found. BTW, my name has 871 exactly matching words that are in wordlist.txt, among them. Sanhedrin, Idaho and ordinals. Weird, huh? But for words of 8 characters or less, using the permutation method and my searching algorithm is competitive to Steve's work. This word list does not contain axolotl, a kind of salamander (301 words in wordlist.txt), but 32 words contained in axolotl were found. With my algorithm, if it's in there, this will find it AND give results in sorted order.

Title: Re: WordCracker
Post by: bplus on October 08, 2018, 09:05:36 pm

Thanks codeguy,

As the man said, "Minimal conditional checking makes for maximum performance. ;) "

Oh hey, I've got try this with my middle name too, 1106! including roman, sagamen.

Title: Re: WordCracker
Post by: SMcNeill on October 08, 2018, 09:33:10 pm

Quote from: bplus on October 08, 2018, 08:32:47 pm

Quote from: SMcNeill on October 08, 2018, 08:19:52 pm
And for those who are having trouble understanding how my last 2 examples work, the secret is in the letter counter.

Take APPLE as an example... Instead of searching to see if the In$ has those 5 letters, we instead count the letters first, when we load the dictionary. 1 A, 2 P, 1 L, 1 E... All we have to check is a maximum of 4 times, instead of 5; we just check to make certain In$ has 2 Ps in it once, rather than checking twice...

In some cases, this reduces a ton of letter checks. MISSISSIPPI. Instead of 11 letters, we check 4... (1 M, 4 I, 4 S, 2 P).

Minimal conditional checking makes for maximum performance. ;)

PLUS, if you want to save even more time, save the processed Word List back into a new Data File, once and forever preprocessed!

Peter smiles, he ain't feel'in so broke no more.

If I were to save it as a processed list, I'd sort it first to put maximum letters first.

Apple has 2Ps and only one of every other letter. Since double letters are rarer than single letters, if we check for the PP first, we're more likely to be able to skip the rest of the checks.

So, instead of 1A2P1L1E, I'd store it as 2P1A1E1L....

Take ELEVEN for a perfect example. It's rare to see 4 Es in a word, but not so rare to find a single L, V, or N. Check it first and chances are you can skip all the other letters completely.

***************

Edit: Scrabble letter values would actually be a good criteria for sorting order. Z, X, Q at the front of the search list, A, E, I, O, U, S, T, and such as the last things compared.

I imagine you could eek out a considerable performance boost with minimal effort, implementing such a method.

Title: Re: WordCracker
Post by: codeguy on October 08, 2018, 11:10:34 pm

Steve, I was really impressed with your speedy performance, so I took the liberty of modifying it for exact same results and significantly faster performance.

Code: QB64: [Select]

 
SCREEN _NEWIMAGE(800, 600, 32)
CONST In$ = "thequickbrownfoxjumpedoverthelazydogdreamingofcountingsheepthinkingofbrownfoxjumpingoverdogs"
DIM wordcount AS LONG: wordcount = 0
DIM maxlength AS LONG: maxlength = 0
InLen = LEN(In$)
 
 
OPEN "WordList.txt" FOR BINARY AS #1
DO UNTIL EOF(1)
    LINE INPUT #1, junk$
    wordcount = wordcount + 1
    IF LEN(junk$) > maxlength THEN maxlength = LEN(junk$)
LOOP
SEEK 1, 1
 
TYPE WordLetters
    letter AS _UNSIGNED _BYTE
    count AS _UNSIGNED _BYTE
END TYPE
DIM Words(wordcount) AS STRING, Match(wordcount) AS LONG
DIM WordLetters(wordcount, 26) AS WordLetters
DIM Letters(97 TO 122) AS _UNSIGNED _BYTE
DIM ascii AS _UNSIGNED _BYTE
DIM KIterWordLettersCount AS LONG
DIM afound AS LONG: afound = 0
DIM w AS LONG
DIM i AS LONG
DIM j AS LONG
FOR i = 1 TO wordcount 'Load and prepare our word list and wordsearch array
    LINE INPUT #1, Words(i)
    FOR j = 1 TO LEN(Words(i))
        ascii = ASC(Words(i), j)
        SELECT CASE ascii
            CASE 97 TO 122
                FOR KIterWordLettersCount = 1 TO WordLetters(i, 0).count
                    IF ascii = WordLetters(i, KIterWordLettersCount).letter THEN afound = KIterWordLettersCount: EXIT FOR
                NEXT
                IF afound THEN
                    WordLetters(i, afound).count = WordLetters(i, afound).count + 1 'add to the count of an existing letter
                    afound = 0
                ELSE
                    WordLetters(i, 0).count = WordLetters(i, 0).count + 1 'increase the total count of letters
                    w = WordLetters(i, 0).count
                    WordLetters(i, w).letter = ascii 'save the letter
                    WordLetters(i, w).count = 1 'and count it 1 for the first time it appeared
                END IF
            CASE ELSE
                WordLetters(i, 0).count = 0 'invalidate words with non A-Z letters
                EXIT FOR
        END SELECT
    NEXT
NEXT
 
FOR i = 1 TO InLen 'get the letters for the search string
    ascii = (ASC(In$, i) OR 32)
    Letters(ascii) = Letters(ascii) + 1
NEXT
 
DIM sl AS LONG
DIM wordfound AS LONG
t# = TIMER(.001)
DO UNTIL TIMER > t# + 5
    loopcount = loopcount + 1
    wordfound = 0 'reset our counter
 
    'THE MAIN SEARCH ROUTINE HERE
 
    FOR i = 1 TO wordcount 'now check every word for a match
        sl = WordLetters(i, 0).count 'search limit
        IF sl THEN
            FOR j = 1 TO sl
                l = WordLetters(i, j).letter 'the letter in the word
                IF Letters(l) < WordLetters(i, j).count THEN GOTO invalid 'it's impossible
            NEXT
            wordfound = wordfound + 1
            Match(wordfound) = i
        END IF
        invalid:
    NEXT
 
    'END OF THE MAIN SEARCH ROUTINE
 
LOOP
 
'Print Results
FOR i = 1 TO wordfound
    PRINT Words(Match(i));
    IF i < wordfound THEN PRINT ","; ELSE PRINT
NEXT
PRINT
PRINT "DONE, with"; wordfound; "matches, with a speed of"; loopcount; "runs in 5 seconds."
 
END
 
'And, if you want to see the secret behind this method, here's how we store and retrieve our data:
 
FOR i = 1 TO 10 'I think just showing how we track 10 words should be fine enough
    PRINT Words(i),
    FOR j = 1 TO WordLetters(i, 0).count
        PRINT CHR$(WordLetters(i, j).letter);
        PRINT WordLetters(i, j).count;
        PRINT ",";
    NEXT
    PRINT
NEXT
 
SLEEP
 
'and here's the in$ and how its letter count holds up
 
 
PRINT In$
FOR i = 97 TO 122
    PRINT Letters(i);
NEXT
PRINT
 
 

On my humble machine, this represents a 45 loops/5s to 80+ loops/5s. Awesome work, Steve.

Title: Re: WordCracker
Post by: SMcNeill on October 09, 2018, 12:13:36 am

Thanks for the kind words, Codeguy. ;)

I've worked with datasets like this thousands of times in the past, so I've learned a few tricks for making them run efficiently. The above was "speedy enough" for most needs, but there's methods quite a bit faster we could employ -- if we wanted to put forth the effort and alter our data somewhat.

Absolute fastest method I can imagine is by dividing our data into a tree structure....

For example, let's start with this tree:

A
AA
AAA

The first 3 entries on our list are those three. By "treeing" our data, we say, "If I don't have A in the search phrase, then I can't have anything below A"

Eliminate "A" and we eliminate EVERYTHING with an A. Our search list just dropped 50k words.

If we have A, but not AA, we've eliminated all words with AA from our search list...

It's a "cascading elimination" scheme and it's efficient, and fast, as heck!

The main issue with it is generating the lookup table to begin with... Your data would need to be stored in a similar manner to this:
A (the word), 52154 (number of words with this eliminator), 2,3,4,5,6.... (Word list)
AA (next word), 2154 (number of words with this eliminator), 3,44,67,87,... (Word list)

**********************

It would bloat our data file considerably, depending on how many "eliminators" we want to use (why use anything more than 2 digits? Longer words get more unique, the longer they become.), but it'd reduce our list of possible words to check by huge chunks at a time...

Fastest method I can think of, at the moment anyway. ;)

(And if you look at my previous code, you can see where I was already generating lists which we could use for elimination purposes for single letters back with the original code in message #18.)

Title: Re: WordCracker
Post by: bplus on October 10, 2018, 01:39:23 pm

This letter count number formula thing lends itself obviously to anagrams, so yesterday I started modifying the file with BFormulas a 26 chr$() string of counts for a, b, c... and all day long as I proceeded through with code tests, I had strongest feeling of Deja Vu that I had done this before, that we, Steve too, had worked through this before, probably at Walter's forum. I check through old code posts 3 times and do not find anything on Anagrams... so I keep going reluctantly because it becomes more and more clear we had done this before.

Finally late last night, I do find the old code posted under Rosetta Folder! Yes, that's right because it started from a challenge from Rosetta stuff we were doing. Dang! I failed to remember the biggest hint of all, to sort by the BFormula$ key. Man what a time saver it is doing it that way, like a blink of the eye!

So if you find a word in your name or you are word building from a string, every anagram of it is automatically included. So along with filing the WordList with the BFormula key at start for saving allot of time, some more time saving could be made by listing all the anagrams that come with such a BFormula! WordList was reduced by 6,500+ lines by listing anagrams. Can't wait to try timed tests for generating words lists.

Text Only | Text with Attachments

QB64.org Forum

Active Forums => Programs => Topic started by: Zeppelin on October 07, 2018, 03:56:15 am