I had to do a little pondering to sort out what the heck is going on to make INSTR faster than jumping with the index and size, and what I've came up with is the same old conclusion I've gathered in the past: There's a lot of overhead in QB64 functions.
x$ = MID$(s$, i + 1, index) <--MID$ is slower than one would like.
index = ASC(s$, i) ' Good for words 9 letters or less. <--ASC is faster than MID$, but still nothing to write home about.
IF x$ = "dog" THEN k = k + 1 <--And then you do a direct string compare....
VS:
seed& = INSTR(seed&, s$, ",dog,") <--Check the result
IF seed& = 0 THEN EXIT DO
x$ = MID$(s$, seed& + 1, LEN("dog")) <--WTH is this line in here for? It actually does NOTHING for the check...
k = k + 1
seed& = seed& + LEN(",dog,")
So, for the comparison to be as fair as possible, we need to remove as much overhead from the program as we can:
CONST Limit
= 100 'number of words in our list CONST Repitition
= 10000 'number of times to search the lists
'make an array of the proper sizes becauseQB64 dosesn't
'understand (AS STRING * variable) for a data type
'so we're setting an array for strings of size 1 to 20, just as a quick placeholder for use with mem.
FOR SearchSize
= 1 TO 20 STEP 5 'run the list multiple times for various size search strings
'Generate suitable search string
search1$
= STRING$(SearchSize
, "A") search2$ = "," + search1$ + "," 'comma before and after
FOR i
= 1 TO Limit: word
(i
) = "":
NEXT 'reset old list
FOR WordNumber
= 1 TO Limit
'the number of words IF WordNumber
MOD 10 = 1 THEN 'every 10th word, no matter what, is one we want to look for word(WordNumber) = search1$
ELSE 'otherwise, make the word junk FOR i
= 1 TO INT(RND * 15) + 1 'up to 15 characters of junk in the spam "words" word
(WordNumber
) = word
(WordNumber
) + CHR$(INT(RND * 26) + 97)
'Words are now generated. Now let's form our two similar lists for searching.
list1$ = "": list2$ = ","
'Size/Word list
'Comma Delimited list
FOR i
= 1 TO Limit: list2$
= list2$
+ word
(i
) + ",":
NEXT
'Wordlists are now built.
template$ = "##.### seconds to find frequency of " + search1$ + "."
template$ = template$ + " Frequency = ###"
_DELAY 1 'a delay so we can watch the tests PRINT "Running Speed Tests on "; search1$
k
= 0: l
= LEN(search1$
): i
= 1 IF index
= l
THEN 'if lengths don't even match, we don't need to compare words 'just jump to the next one
_MEMGET m
, m.OFFSET
+ i
, Strings
(index
) 'get the string direct from memory (like MID$ in Pete's demo) IF Strings
(index
) = search1$
THEN k
= k
+ 1 i = i + index + 1
k = 0
seed&
= INSTR(seed&
, list2$
, search2$
) k = k + 1
seed&
= seed&
+ LEN(search2$
)
Things are running so quickly here, we're having to run a search on a list of 100 words, 10000 times, to generate any significant times for comparison.
CONST Limit = 100 'number of words in our list
CONST Repitition = 10000 'number of times to search the lists
I was curious if the length of the search$ made any real difference, and it doesn't really seem to affect much from my testing.
Notable changes in this and your routine:
index = _MEMGET(m, m.OFFSET + i - 1, _UNSIGNED _BYTE) 'Get the index directly, skip ASC function call
IF index = l THEN 'if lengths don't even match, we don't need to compare words
'just jump to the next one
_MEMGET m, m.OFFSET + i, Strings(index) 'get the string direct from memory (like MID$ in Pete's demo)
IF Strings(index) = search1$ THEN k = k + 1
END IF
We strip out the use of ASC, MID$, and don't even bother to get the word if the two lengths don't match...
And, to keep it fair:
DO
seed& = INSTR(seed&, list2$, search2$)
IF seed& = 0 THEN EXIT DO
k = k + 1
seed& = seed& + LEN(search2$)
LOOP
I basically stripped out that extra line where you were calculating x$ for some odd reason with your search...
Times on my PC are basically 0.015 seconds for SIZE/STRING storage and lookup, verses 0.025 seconds for INSTR/DELIMITED storage and lookup.
Logic says jumping and skipping searches would be faster than searching byte by byte, but once we start tacking on the overhead associated with our function calls, the gap closes quickly. Without using _MEM to replace ASC and MID$, I couldn't seem to top the speed of INSTR.. the overhead was just too great.
Which now leaves me pondering -- why do the changes I made to the hash table in the post after Ed's make it so much quicker than the previous versions on my machine? I guess that'll be a mystery to sit and study on tomorrow. For now, the bed is calling my name...