Thanks for the kind words, Codeguy. ;)
I've worked with datasets like this thousands of times in the past, so I've learned a few tricks for making them run efficiently. The above was "speedy enough" for most needs, but there's methods quite a bit faster we could employ -- if we wanted to put forth the effort and alter our data somewhat.
Absolute fastest method I can imagine is by dividing our data into a tree structure....
For example, let's start with this tree:
A
AA
AAA
The first 3 entries on our list are those three. By "treeing" our data, we say, "If I don't have A in the search phrase, then I can't have anything below A"
Eliminate "A" and we eliminate EVERYTHING with an A. Our search list just dropped 50k words.
If we have A, but not AA, we've eliminated all words with AA from our search list...
It's a "cascading elimination" scheme and it's efficient, and fast, as heck!
The main issue with it is generating the lookup table to begin with... Your data would need to be stored in a similar manner to this:
A (the word), 52154 (number of words with this eliminator), 2,3,4,5,6.... (Word list)
AA (next word), 2154 (number of words with this eliminator), 3,44,67,87,... (Word list)
**********************
It would bloat our data file considerably, depending on how many "eliminators" we want to use (why use anything more than 2 digits? Longer words get more unique, the longer they become.), but it'd reduce our list of possible words to check by huge chunks at a time...
Fastest method I can think of, at the moment anyway. ;)
(And if you look at my previous code, you can see where I was already generating lists which we could use for elimination purposes for single letters back with the original code in message #18.)