Author Topic: WordCracker  (Read 14913 times)

0 Members and 1 Guest are viewing this topic.

Offline Zeppelin

  • Newbie
  • Posts: 43
    • View Profile
    • Zeppelin Games ItchIo
Re: WordCracker
« Reply #15 on: October 08, 2018, 01:11:38 am »
Wow. Thanks everyone for their responses. Thanks for the performance tips (It really helps. I still learning).
Its nice to see everyone trying to out-do each others programs.

I have to say SMcNeill..... what do you do with your life.

Just joking man. Unbelievably fast program by the way.

Thanks everyone,
Zeppelin
+[--->++<]>+.+++[->++++<]>.[--->+<]>+.-[---->+<]>++.+[->+++<]>+.+++++++++++.----------.[--->+<]>----.+[---->+<]>+++.---[->++++<]>.------------.+.++++++++++.+[---->+<]>+++.+[->+++<]>++.[--->+<]>+.[->+++<]>+.++++++++++.+.>++++++++++.

Offline bplus

  • Global Moderator
  • Forum Resident
  • Posts: 8053
  • b = b + ...
    • View Profile
Re: WordCracker
« Reply #16 on: October 08, 2018, 10:10:05 am »
Made some more improvements like getting rid of blank word and preprocessed file according to number of letters input to build words from. "abcdefghi" now runs around 518 loops in 5 secs and finds 351 words from "Steve McNeil" input. Steve's code crashes if "Steve McNeil" is input. ;-))
Code: QB64: [Select]
  1. _TITLE "WordCrack mod 2 with preprocessing.bas B+ 2018-10-08"
  2.  
  3. ' now with timer mod and preprocessing of word file to length of in$
  4.  
  5. 'INPUT "Enter a string to build words from "; inp$
  6. inp$ = "Steve McNeil"
  7. in$ = LCASE$(inp$)
  8. lenin = LEN(in$)
  9.  
  10. 'load words
  11. OPEN "WordList.txt" FOR BINARY AS #1
  12. gulp& = LOF(1)
  13. buff$ = STRING$(gulp&, " ")
  14. GET #1, , buff$
  15. REDIM longWords$(0)
  16. Split buff$, CHR$(10), longWords$()
  17.  
  18. 'preprocess word list, use only words <= the build words string
  19. DIM word$(84100)
  20. FOR i& = 0 TO UBOUND(longWords$)
  21.     IF LEN(longWords$(i&)) <= lenin THEN
  22.         IF LTRIM$(longWords$(i&)) <> "" THEN
  23.             IF ASC(longWords$(i&)) > 96 AND ASC(longWords$(i&)) < 123 THEN
  24.                 wi& = wi& + 1
  25.                 word$(wi&) = longWords$(i&)
  26.             END IF
  27.         END IF
  28.     END IF
  29.  
  30. 'NOW do the loop count, should add lots of loops checking one less thing per loop and looping less times
  31. start! = TIMER
  32. WHILE TIMER - start! < 5
  33.     'RUN THROUGH WORDS
  34.     foundWords% = 0
  35.     FOR n& = 0 TO wi&
  36.         c$ = in$
  37.         OK% = -1
  38.         FOR i% = 1 TO LEN(word$(n&))
  39.             p% = INSTR(c$, MID$(word$(n&), i%, 1))
  40.             IF p% = 0 THEN
  41.                 OK% = 0: EXIT FOR
  42.             ELSE
  43.                 MID$(c$, p%, 1) = "+"
  44.             END IF
  45.         NEXT
  46.         IF OK% THEN foundWords% = foundWords% + 1: PRINT word$(n&); ", ";
  47.     NEXT
  48.     counter% = counter% + 1
  49. PRINT: PRINT: PRINT "Found:"; foundWords%; "words in "; CHR$(34); inp$; CHR$(34); ","; counter%; "times in 5 secs."
  50.  
  51. SUB Split (mystr AS STRING, delim AS STRING, arr() AS STRING)
  52.     ' bplus modifications of Galleon fix of Bulrush Split reply #13
  53.     ' http://www.[abandoned, outdated and now likely malicious qb64 dot net website - don’t go there]/forum/index.php?topic=1612.0
  54.     ' this sub further developed and tested here: \test\Strings\Split test.bas
  55.     DIM copy AS STRING, p AS LONG, curpos AS LONG, arrpos AS LONG, lc AS LONG, dpos AS LONG
  56.     copy = mystr 'make copy since we are messing with mystr
  57.     'special case if delim is space, probably want to remove all excess space
  58.     IF delim = " " THEN
  59.         copy = RTRIM$(LTRIM$(copy))
  60.         p = INSTR(copy, "  ")
  61.         WHILE p > 0
  62.             copy = MID$(copy, 1, p - 1) + MID$(copy, p + 1)
  63.             p = INSTR(copy, "  ")
  64.         WEND
  65.     END IF
  66.     curpos = 1
  67.     arrpos = 0
  68.     lc = LEN(copy)
  69.     dpos = INSTR(curpos, copy, delim)
  70.     DO UNTIL dpos = 0
  71.         arr(arrpos) = MID$(copy, curpos, dpos - curpos)
  72.         arrpos = arrpos + 1
  73.         REDIM _PRESERVE arr(arrpos + 1) AS STRING
  74.         curpos = dpos + LEN(delim)
  75.         dpos = INSTR(curpos, copy, delim)
  76.     LOOP
  77.     arr(arrpos) = MID$(copy, curpos)
  78.     REDIM _PRESERVE arr(arrpos) AS STRING
  79.  

PS codeguy's code is going to crash also if only permutations up to 8 letters is allowed, without even checking I can see that.

Less than 4 hours to go. Will Steve fix his code in time? Stay tuned...  ;)
Steve McNeil on Word Crack.PNG
* Steve McNeil on Word Crack.PNG (Filesize: 57.18 KB, Dimensions: 1026x693, Views: 374)
« Last Edit: October 08, 2018, 10:18:15 am by bplus »

Offline bplus

  • Global Moderator
  • Forum Resident
  • Posts: 8053
  • b = b + ...
    • View Profile
Re: WordCracker
« Reply #17 on: October 08, 2018, 10:23:45 am »
Steve wins again! My name only has 271 words in it. :) (but none are vile)

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
Re: WordCracker
« Reply #18 on: October 08, 2018, 11:27:46 am »
Modified to run with any size string (even "Steve McNeil"; which, btw, is named WRONG.  /cry!!):

Code: QB64: [Select]
  1. WIDTH 80, 50
  2. DEFLNG A-Z
  3. 'SET KEY AND LETTERS
  4. keys = 0 ' ASC("S") OR &B00100000 '0 for no key limiter, otherwise use the ASC("letter") OR &B00100000 to force lowercase value
  5. in$ = LCASE$("SteveMcNeil")
  6. lenin = LEN(in$) '<<<<<<<<<<< for faster processing
  7.  
  8.  
  9. DIM word$(84100)
  10. DIM SHARED ltr(1 TO lenin) '<<<<<<<<<<< shared to debug
  11. DIM backup(1 TO lenin)
  12. DIM keycount(97 TO 122, 60000) 'letter/index
  13. DIM m1 AS _MEM, m2 AS _MEM 'just for quick array restoration
  14. m1 = _MEM(ltr()): m2 = _MEM(backup())
  15.  
  16.  
  17. 'SPLIT UP LETTERS
  18. FOR n = 1 TO lenin
  19.     ltr(n) = ASC(in$, n) OR &B00100000
  20. _MEMPUT m2, m2.OFFSET, ltr()
  21.  
  22.  
  23. 'LOAD ALL WORDS FROM .TXT FILE
  24. PRINT "LOADING...."
  25. OPEN "WordList.txt" FOR BINARY AS #1 '<<<<<<<<< moved to more appropriate place
  26. filecount = 0
  27.     LINE INPUT #1, line$
  28.     IF LEN(line$) <= lenin THEN
  29.         filecount = filecount + 1
  30.         word$(filecount) = line$
  31.         FOR i = 97 TO 122
  32.             IF INSTR(line$, CHR$(i)) THEN
  33.                 keycount(i, 0) = keycount(i, 0) + 1 'record 0 tells us how many there are for that letter
  34.                 keycount(i, keycount(i, 0)) = filecount 'add the word number to the proper index
  35.             END IF
  36.         NEXT
  37.     END IF
  38.  
  39.  
  40. t# = TIMER
  41. timelimit = 5
  42.  
  43.  
  44. DO UNTIL TIMER > t# + timelimit
  45.     CLS
  46.     loopsran = loopsran + 1
  47.     totalcount = 0
  48.  
  49.  
  50.     'RUN THROUGH WORDS
  51.     IF keys = 0 THEN k = filecount ELSE k = keycount(keys, 0)
  52.     FOR n = 1 TO k
  53.         w$ = word$(n)
  54.         FOR i = 1 TO LEN(w$)
  55.             templtr = ASC(w$, i)
  56.             FOR x = 1 TO lenin
  57.                 IF ltr(x) = templtr THEN ltr(x) = 0: GOTO stillvalid
  58.             NEXT x
  59.             GOTO skipmore
  60.             stillvalid:
  61.         NEXT i
  62.  
  63.         totalcount = totalcount + 1
  64.         PRINT w$,
  65.  
  66.         skipmore:
  67.         $CHECKING:OFF
  68.         _MEMPUT m1, m1.OFFSET, backup()
  69.         $CHECKING:ON
  70.         count = 0
  71.     NEXT n
  72.  
  73.     PRINT "DONE..."
  74. PRINT USING "###,###,###,###,### loops ran in ##.# seconds"; loopsran, timelimit
  75. PRINT totalcount; "matches found for "; in$

About 300 loops in 5 seconds with the longer name giving us more letters to check against.  Oddly enough, my list is only counting 350 words for us, and not 351 as Bplus says his is generating.  Is this a glitch in my counting method?  His?  Or is somebody leaving a word out, or including a false positive?   

I dunno!

Some digging will be required to see where that extra word comes from, and what the heck it is!  :P
« Last Edit: October 08, 2018, 11:29:13 am by SMcNeill »
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
Re: WordCracker
« Reply #19 on: October 08, 2018, 11:41:33 am »
Testing shows that Bplus is wrong.   :D

Compare the 2 file dumps (the first is mine, the second is Bplus's), and it's immediately obvious what the issue is, right at the very top of the file -- a BLANK word.  There's 350 words, with an extra "" at the top of his list.   

Steve wins again!  ;D
* filedump.txt (Filesize: 2.16 KB, Downloads: 285)
* filedump2.txt (Filesize: 2.16 KB, Downloads: 271)
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline bplus

  • Global Moderator
  • Forum Resident
  • Posts: 8053
  • b = b + ...
    • View Profile
Re: WordCracker
« Reply #20 on: October 08, 2018, 12:09:55 pm »
Cra...

Sorry about your name Steve, my eyes are going bad.

Wait,  , isn't a word? ;-))  (where the heck did that come from?)

Thanks for heads up!


Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
Re: WordCracker
« Reply #21 on: October 08, 2018, 12:44:28 pm »
Cra...

Sorry about your name Steve, my eyes are going bad.

No worries; it's the story of my life...  My family has lived here since forever, so when it was time to name the roads, the powers that be named the road here after my family...

And, spelt it wrong!

No kidding!!

You can find me on McNeil Hill Rd, still laughing at the irony of being immortalized incorrectly.

*****************

And funniest thing?? 

The local government says we can change the name, as long as WE are willing to pay the $$$$$ to change the road signs.

Two little signs, only needing an extra "l" in them, and yet to comply to state standards, they cost over $2600 each!

It's no damn wonder our government is broke, trying to hit such costs just to stay within regulation.  On a low traveled, rural as heck road like mine, a $2.00 slab of wood, $1 can of white paint, and $1 can of black paint for lettering, would be more than sufficient to serve the needs of the community...

And yet...  The sign has to be a certain gauge steel, painted X coats of reflective green, with Y coats of Z-size white paint, mixed with XX percent reflective beads....

(And, just in case you want to see how nutty regulations are for government stuff, here's the regs on signs:  http://www.vdot.virginia.gov/business/resources/TED/final_MUTCD/2013_sup/Revision_1_Part_2_Signs.pdf --- Well, it's the BASIC regulations.  There are 14 supplements, with 5 supplements to supplement the supplements also...l)
« Last Edit: October 08, 2018, 12:58:20 pm by SMcNeill »
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline Cobalt

  • QB64 Developer
  • Forum Resident
  • Posts: 878
  • At 60 I become highly radioactive!
    • View Profile
Re: WordCracker
« Reply #22 on: October 08, 2018, 01:04:32 pm »
And funniest thing?? 

The local government says we can change the name, as long as WE are willing to pay the $$$$$ to change the road signs.

Two little signs, only needing an extra "l" in them, and yet to comply to state standards, they cost over $2600 each!

It's no damn wonder our government is broke, trying to hit such costs just to stay within regulation.  On a low traveled, rural as heck road like mine, a $2.00 slab of wood, $1 can of white paint, and $1 can of black paint for lettering, would be more than sufficient to serve the needs of the community...

And yet...  The sign has to be a certain gauge steel, painted X coats of reflective green, with Y coats of Z-size white paint, mixed with XX percent reflective beads....

(And, just in case you want to see how nutty regulations are for government stuff, here's the regs on signs:  http://www.vdot.virginia.gov/business/resources/TED/final_MUTCD/2013_sup/Revision_1_Part_2_Signs.pdf --- Well, it's the BASIC regulations.  There are 14 supplements, with 5 supplements to supplement the supplements also...l)

just do what the kids do, take some paint and add your own extra 'l', find some spray paint thats lying around it costs you nothing! unless you get caught and fined. or go the extra mile with some masking tape so it looks better!
175 pages, does that meet novel requirements too?
« Last Edit: October 08, 2018, 01:06:44 pm by Cobalt »
Granted after becoming radioactive I only have a half-life!

Offline bplus

  • Global Moderator
  • Forum Resident
  • Posts: 8053
  • b = b + ...
    • View Profile
Re: WordCracker
« Reply #23 on: October 08, 2018, 03:29:34 pm »
Yep, for shorter strings Steve's loop count is a winner!

But for longer strings this current version is a winner.
Code: QB64: [Select]
  1. WIDTH 80, 50
  2. _TITLE "WordCrack mod 2 with preprocessing.bas B+ 2018-10-08"
  3.  
  4. ' now with timer mod and preprocessing of word file to length of in$
  5.  
  6. ' fix the extra "" word being printed   12:28 PM
  7. ' use same print method as Steve's code, change type for wordsFound
  8.  
  9. 'INPUT "Enter a string to build words from "; inp$
  10. inp$ = "thequickbrownfoxjumpedoverthelazydog"
  11. 'inp$ = "thequickbrownfoxjumpedoverthelazydogdreamingofcountingsheep"
  12. 'inp$ = "thequickbrownfoxjumpedoverthelazydogdreamingofcountingsheepthinkingofbrownfoxjumpingoverdogs"
  13. in$ = LCASE$(inp$)
  14. lenin = LEN(in$)
  15.  
  16. 'load words
  17. OPEN "WordList.txt" FOR BINARY AS #1
  18. gulp& = LOF(1)
  19. buff$ = STRING$(gulp&, " ")
  20. GET #1, , buff$
  21. REDIM longWords$(0)
  22. Split buff$, CHR$(10), longWords$()
  23.  
  24. 'preprocess word list, use only words <= the build words string
  25. DIM word$(84100)
  26. FOR i& = 0 TO UBOUND(longWords$)
  27.     IF LEN(longWords$(i&)) <= lenin THEN
  28.         IF LTRIM$(longWords$(i&)) <> "" THEN
  29.             IF ASC(longWords$(i&)) > 96 AND ASC(longWords$(i&)) < 123 THEN
  30.                 wi& = wi& + 1
  31.                 word$(wi&) = longWords$(i&)
  32.             END IF
  33.         END IF
  34.     END IF
  35.  
  36. 'NOW do the loop count, should add lots of loops checking one less thing per loop and looping less times
  37. start! = TIMER
  38. WHILE TIMER - start! < 5
  39.     'RUN THROUGH WORDS
  40.     foundWords& = 0
  41.     FOR n& = 1 TO wi& '<<<<<<<<<<<<<<<<<<<  fixed from 0 to 1
  42.         c$ = in$
  43.         OK% = -1
  44.         FOR i% = 1 TO LEN(word$(n&))
  45.             p% = INSTR(c$, MID$(word$(n&), i%, 1))
  46.             IF p% = 0 THEN
  47.                 OK% = 0: EXIT FOR
  48.             ELSE
  49.                 MID$(c$, p%, 1) = "+"
  50.             END IF
  51.         NEXT
  52.         IF OK% THEN foundWords& = foundWords& + 1: PRINT word$(n&),
  53.     NEXT
  54.     counter& = counter& + 1
  55. PRINT: PRINT: PRINT "Found:"; foundWords&; "words in "; CHR$(34); inp$; CHR$(34); ","; counter&; "times in 5 secs."
  56.  
  57. SUB Split (mystr AS STRING, delim AS STRING, arr() AS STRING)
  58.     ' bplus modifications of Galleon fix of Bulrush Split reply #13
  59.     ' http://www.[abandoned, outdated and now likely malicious qb64 dot net website - don’t go there]/forum/index.php?topic=1612.0
  60.     ' this sub further developed and tested here: \test\Strings\Split test.bas
  61.     DIM copy AS STRING, p AS LONG, curpos AS LONG, arrpos AS LONG, lc AS LONG, dpos AS LONG
  62.     copy = mystr 'make copy since we are messing with mystr
  63.     'special case if delim is space, probably want to remove all excess space
  64.     IF delim = " " THEN
  65.         copy = RTRIM$(LTRIM$(copy))
  66.         p = INSTR(copy, "  ")
  67.         WHILE p > 0
  68.             copy = MID$(copy, 1, p - 1) + MID$(copy, p + 1)
  69.             p = INSTR(copy, "  ")
  70.         WEND
  71.     END IF
  72.     curpos = 1
  73.     arrpos = 0
  74.     lc = LEN(copy)
  75.     dpos = INSTR(curpos, copy, delim)
  76.     DO UNTIL dpos = 0
  77.         arr(arrpos) = MID$(copy, curpos, dpos - curpos)
  78.         arrpos = arrpos + 1
  79.         REDIM _PRESERVE arr(arrpos + 1) AS STRING
  80.         curpos = dpos + LEN(delim)
  81.         dpos = INSTR(curpos, copy, delim)
  82.     LOOP
  83.     arr(arrpos) = MID$(copy, curpos)
  84.     REDIM _PRESERVE arr(arrpos) AS STRING
  85.  
  86.  

inp$ = "thequickbrownfoxjumpedoverthelazydog" > 74 loops versus 65 Steve's
'inp$ = "thequickbrownfoxjumpedoverthelazydogdreamingofcountingsheep" > 41 versus 36-37 Steve's
'inp$ = "thequickbrownfoxjumpedoverthelazydogdreamingofcountingsheepthinkingofbrownfoxjumpingoverdogs" > 37 versus 32-33 Steve's

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
Re: WordCracker
« Reply #24 on: October 08, 2018, 03:40:47 pm »
Quote
Yep, for shorter strings Steve's loop count is a winner!

But for longer strings this current version is a winner.

When dealing with such long strings, I'd definitely go a different route.  Read letters.  Preset an array.   Just compare letter counts.  I imagine it'd be a ton faster than how we're currently doing it, but that's because the program needs would be much different than originally stated.

I'll play around with it some and see just how quick a routine I can whip up for those longer words/phrases.  :)
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline codeguy

  • Forum Regular
  • Posts: 174
    • View Profile
Re: WordCracker
« Reply #25 on: October 08, 2018, 03:58:27 pm »
I got 361 words matching stevemcneill using the wordlist.txt as included in an attachment for this post. Somewhere, someone is wrong. Mine was modified to accommodate the added characters. Yes, it took a while as I did no optimizations for eliminating impossible prefixes and suffixes.

Offline bplus

  • Global Moderator
  • Forum Resident
  • Posts: 8053
  • b = b + ...
    • View Profile
Re: WordCracker
« Reply #26 on: October 08, 2018, 04:22:34 pm »
Hi codeguy,

For stevemcneill, 2 l's on end, I am getting 392 words with Steve's code and my version.

I don't know about just a letter? is t a word? I know I is. ;)

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
Re: WordCracker
« Reply #27 on: October 08, 2018, 04:27:12 pm »
Hi codeguy,

For stevemcneill, 2 l's on end, I am getting 392 words with Steve's code and my version.

I don't know about just a letter? is t a word? I know I is. ;)

Large dictionaries list each letter as a single-letter word. Each such word is defined as a noun, denoting the letter with which it is spelled.

"Psychology" starts with a p.

*********************

What if Q from Star Trek took the L train to the T intersection of C and D streets because he had a map where an X marked that spot? Would he get an A for following directions?
« Last Edit: October 08, 2018, 04:32:21 pm by SMcNeill »
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline codeguy

  • Forum Regular
  • Posts: 174
    • View Profile
Re: WordCracker
« Reply #28 on: October 08, 2018, 04:41:19 pm »
You ARE using the original wordlist.txt from the original thread post https://www.qb64.org/forum/index.php?action=dlattach;topic=679.0;attach=1619, right? I got 361 unique matches using "stevemcneill" as the original string on my modified code, which was considerably slower BUT it does EXHAUSTIVE permutations of every letter, leading to quite large numbers, in fact the size of the job increases as a factorial.
« Last Edit: October 08, 2018, 04:43:00 pm by codeguy »

Offline bplus

  • Global Moderator
  • Forum Resident
  • Posts: 8053
  • b = b + ...
    • View Profile
Re: WordCracker
« Reply #29 on: October 08, 2018, 04:52:37 pm »
Hi codeguy,

Is your code geared to handle the triple e and double l in stevemcneill? This is why I thought "nah!" for permutations.

Hi Steve,

You sure know how to drive home a point, N, S, E and W! ;)