Author Topic: Split and join strings (Read 16769 times)

luke · « **on:** February 15, 2019, 04:11:07 am »

Given a string of words separated by spaces (or any other character), splits it into an array of the words. I've no doubt many people have written a version of this over the years and no doubt there's a million ways to do it, but I thought I'd put mine here so we have at least one version. There's also a join function that does the opposite array -> single string.

Code is hopefully reasonably self explanatory with comments and a little demo. Note, this is akin to Python/JavaScript split/join, PHP explode/implode.

Code: QB64: [Select]

redim words$(0)
 
original$ = "The rain   in Spain  "
print "Original string: "; original$
print
 
split original$, " ", words$()
 
print "Words:"
for i = lbound(words$) to ubound(words$)
    print words$(i)
next i
print
 
print "Joined with commas: ";join$(words$(), ",")
 
'Split in$ into pieces, chopping at every occurrence of delimiter$. Multiple consecutive occurrences
'of delimiter$ are treated as a single instance. The chopped pieces are stored in result$().
'
'delimiter$ must be one character long.
'result$() must have been REDIMmed previously.
sub split(in$, delimiter$, result$())
    redim result$(-1)
    start = 1
    do
        while mid$(in$, start, 1) = delimiter$
            start = start + 1
            if start > len(in$) then exit sub
        wend
        finish = instr(start, in$, delimiter$)
        if finish = 0 then finish = len(in$) + 1
        redim _preserve result$(0 to ubound(result$) + 1)
        result$(ubound(result$)) = mid$(in$, start, finish - start)
        start = finish + 1
    loop while start <= len(in$)
end sub
 
'Combine all elements of in$() into a single string with delimiter$ separating the elements.
function join$(in$(), delimiter$)
    result$ = in$(lbound(in$))
    for i = lbound(in$) + 1 to ubound(in$)
        result$ = result$ + delimiter$ + in$(i)
    next i
    join$ = result$
end function

RhoSigma · « **Reply #1 on:** February 15, 2019, 04:29:43 am »

Here are my two cents,

words/components can be seperated by arbitrary number of whitespace (TAB/SPACE) and "quoted" sections are taken as one word/component. It also is satisfied with any defined dynamic array, as it REDIMs it as needed.

EDIT:
Old code removed, goto https://qb64forum.alephc.xyz/index.php?topic=4142 for the latest update.

bplus · « **Reply #2 on:** February 15, 2019, 01:38:44 pm »

And my 2 cents:
My Split sub handles more than one char delimiters, comes in handy for Splitting a file into an array of lines with the delimiter set as Chr($13)+chr$(10) for a .bas for instance. It also handles the space delimiter specially by removing all double spaces before splitting.

Oh, also, I like to use long strings for arrays. For that you need to be able to leave empty spots open for strings eg see the days of week example.

Code: QB64: [Select]

'split test.bas for qb64 bplus 2018-05-07
' I think I want to replace my inefficient Wrd function
 
'2018-08-25 reworked for space delimiters and more variable declares
'2019-02-15 add Luke's version to compare
ntests = 5
DIM a(ntests - 1) AS STRING, d(ntests - 1) AS STRING
 
a(0) = ""
d(0) = " "
a(1) = " test test    test " 'good no error!
d(1) = " "
a(2) = " test"
d(2) = " "
a(3) = "3d,z6d,z1 10 #d,z5"
d(3) = ",z"
a(4) = "Monday, , Wednesday, THursday, Friday, , Sunday"
d(4) = ", "
 
FOR test = 0 TO ntests - 1
    PRINT: PRINT "splitting {"; a(test); "} with delimeter {"; d(test); "}"
    REDIM myarr(0) AS STRING '<<<<< REDIM forces the creation of a dynamic/resizable array
    Split a(test), d(test), myarr()
    amax = UBOUND(myarr)
    FOR i = 0 TO amax
        PRINT i; ":"; myarr(i)
    NEXT i
    INPUT "press enter for next test... "; wate$
NEXT
 
' how about a quick file reader test?
PRINT: INPUT "Press enter for file test, any other + enter quits! "; wate$
IF LEN(wate$) THEN END
CLS
 
'other wise continue
OPEN "Split test.bas" FOR BINARY AS #1 '<<< this file name!!!
ftext$ = SPACE$(LOF(1))
GET #1, , ftext$
CLOSE #1
Split ftext$, CHR$(13) + CHR$(10), myarr()
FOR i = 0 TO UBOUND(myarr)
    PRINT myarr(i)
    IF i MOD 20 = 19 THEN PRINT: INPUT "press enter for more "; wate$
NEXT
PRINT "the end"
END ' end program
 
 
'the space delimiter is such a special case perhaps I should develope a single split for that alone?
 
 
'notes: REDIM the array(0) to be loaded before calling Split '<<<<<<<<<<<<<<<<<<<<<<< IMPORTANT!!!!
SUB Split (mystr AS STRING, delim AS STRING, arr() AS STRING)
    ' bplus modifications of Galleon fix of Bulrush Split reply #13
    ' http://xmaxw.[abandoned, outdated and now likely malicious qb64 dot net website - don’t go there]/forum/index.php?topic=1612.0
    ' this sub further developed and tested here: \test\Strings\Split test.bas
    DIM copy AS STRING, p AS LONG, curpos AS LONG, arrpos AS LONG, dpos AS LONG
 
    copy = mystr 'make copy since we are messing with mystr when the delimiter is a space
 
    'special case if delim is space, probably want to remove all excess space
    IF delim = " " THEN
        copy = RTRIM$(LTRIM$(copy))
        p = INSTR(copy, "  ")
        WHILE p > 0
            copy = MID$(copy, 1, p - 1) + MID$(copy, p + 1)
            p = INSTR(copy, "  ")
        WEND
    END IF
    curpos = 1
    arrpos = 0
    dpos = INSTR(curpos, copy, delim)
    DO UNTIL dpos = 0
        arr(arrpos) = MID$(copy, curpos, dpos - curpos)
        arrpos = arrpos + 1
        REDIM _PRESERVE arr(arrpos + 1) AS STRING
        curpos = dpos + LEN(delim)
        dpos = INSTR(curpos, copy, delim)
    LOOP
    arr(arrpos) = MID$(copy, curpos)
    REDIM _PRESERVE arr(arrpos) AS STRING 'need this line? YES to get the ubound correct
END SUB
 
 
' Luke 2019-02-15
'Split in$ into pieces, chopping at every occurrence of delimiter$. Multiple consecutive occurrences
'of delimiter$ are treated as a single instance. The chopped pieces are stored in result$().
'
'delimiter$ must be one character long.
'result$() must have been REDIMmed previously.
SUB Lsplit (in$, delimiter$, result$())
    REDIM result$(-1)
    start = 1
    DO
        WHILE MID$(in$, start, 1) = delimiter$
            start = start + 1
            IF start > LEN(in$) THEN EXIT SUB
        WEND
        finish = INSTR(start, in$, delimiter$)
        IF finish = 0 THEN finish = LEN(in$) + 1
        REDIM _PRESERVE result$(0 TO UBOUND(result$) + 1)
        result$(UBOUND(result$)) = MID$(in$, start, finish - start)
        start = finish + 1
    LOOP WHILE start <= LEN(in$)
END SUB
 
'Combine all elements of in$() into a single string with delimiter$ separating the elements.
FUNCTION join$ (in$(), delimiter$)
    result$ = in$(LBOUND(in$))
    FOR i = LBOUND(in$) + 1 TO UBOUND(in$)
        result$ = result$ + delimiter$ + in$(i)
    NEXT i
    join$ = result$
END FUNCTION
 
 

I would like to test RhoSigma's but looks like a couple of specific delimiters are used.

I think I had a reason not to redim the arr inside the Split sub, can't recall it now. It does seem more convenient to handle it inside the sub.

Append: you only have to redim once outside the Split sub before the call to it, to let QB64 know you are using that name as a dynamic array.

RhoSigma · « **Reply #3 on:** February 15, 2019, 04:03:16 pm »

Quote from: bplus on February 15, 2019, 01:38:44 pm

I would like to test RhoSigma's but looks like a couple of specific delimiters are used.

??? - I hope you doesn't speak about the CHR$(34) in the example lines, it's a quote (").

The ParseLine&() function is a rather simple (yet incomplete) command line parsing function, whitespace separates options/words, whitespace in this case means any number of TABs and/or SPACEs, and quoted parts are taken as is (ie. incl. whitespace) as you usually would need it on a command line for filenames which contain spaces or any chars with special meanings in the command interpreter suche as %()<>& etc..
My function does otherwise not allow for any user defined delimiters.

SMcNeill · « **Reply #4 on:** February 15, 2019, 08:09:23 pm »

Here's a split routine which I'll toss into the mess as well:

Code: QB64: [Select]

CONST ntests = 5
DIM a(ntests - 1) AS STRING, d(ntests - 1) AS STRING
REDIM results(0) AS STRING
 
a(0) = ""
d(0) = " "
a(1) = " test test    test " 'good no error!
d(1) = " "
a(2) = " test"
d(2) = " "
a(3) = "3d,z6d,z1 10 #d,z5"
d(3) = ",z"
a(4) = "Monday, , Wednesday, THursday, Friday, , Sunday"
d(4) = ", "
 
FOR i = 0 TO ntests - 1
    PRINT "Splitting: "; a(i)
    SteveSplit a(i), d(i), results()
    FOR j = 1 TO UBOUND(results)
        PRINT j, results(j)
    NEXT
    SLEEP
NEXT
 
SUB SteveSplit (text$, delimiter$, storage_array() AS STRING)
    STATIC count AS LONG
    count = count + 1
    u = UBOUND(storage_array)
    IF count > u THEN REDIM _PRESERVE storage_array(u + 1000) AS STRING
    i = INSTR(text$, delimiter$)
    IF i THEN
        storage_array(count) = LEFT$(text$, i - 1)
        SteveSplit MID$(text$, i + LEN(delimiter$)), delimiter$, storage_array()
    ELSE
        storage_array(count) = text$
        REDIM _PRESERVE storage_array(count) AS STRING
        count = 0
    END IF
END SUB

I was even nice and named it "SteveSplit" so folks can test for speed and compare results if they wish verses the other split routines.

One thing to note: This *doesn't* strip off any extra leading/trailing spaces. Why would it, if they're delimiters in our data?

Let's say we have the data of the following:

1,2,,,5,6

When we INPUT it from a file, we get data of:
"1"
"2"
""
""
"5"
"6"

Those commas are valid delimiters of null strings. If we're using a space as a delimiter, then shouldn't it also follow the same behavior of the comma? If the user doesn't want to include/process null strings, then let them ignore them elsewhere in their code. As far as our data is concerned, they're valid split points, in my opinion.

SMcNeill · « **Reply #5 on:** February 16, 2019, 02:14:25 am »

And a timed comparison of the three routines:

Code: QB64: [Select]

CONST ntests = 5
DIM a(ntests - 1) AS STRING, d(ntests - 1) AS STRING
REDIM results1(0) AS STRING
REDIM results2(0) AS STRING
REDIM results3(0) AS STRING
 
CONST Limit = 1000000
 
a(0) = ""
d(0) = " "
a(1) = " test test    test " 'good no error!
d(1) = " "
a(2) = " test"
d(2) = " "
a(3) = "3d,z6d,z1 10 #d,z5"
d(3) = ",z"
a(4) = "Monday, , Wednesday, THursday, Friday, , Sunday"
d(4) = ", "
 
FOR i = 0 TO ntests - 1
    t# = TIMER
    FOR j = 1 TO Limit 'repeat the process multiple times so we can time it.
        SteveSplit a(i), d(i), results1()
    NEXT
    t1# = TIMER
    FOR j = 1 TO Limit 'repeat the process multiple times so we can time it.
        REDIM results2(0) AS STRING
        Split a(i), d(i), results2()
    NEXT
    t2# = TIMER
    FOR j = 1 TO Limit 'repeat the process multiple times so we can time it.
        Lsplit a(i), d(i), results3()
    NEXT
    t3# = TIMER
    PRINT "TEST #"; i; " -- Splitting: "; CHR$(34); a(i); CHR$(34); " with "; CHR$(34); d(i); CHR$(34)
    PRINT USING "###.####     ###.####     ###.####"; t1# - t#, t2# - t1#, t3# - t2#
 
    FOR j = 1 TO UBOUND(results1)
        PRINT j, CHR$(34); results1(j); CHR$(34),
        IF j <= UBOUND(results2) + 1 THEN PRINT CHR$(34); results2(j - 1); CHR$(34),
        IF j <= UBOUND(results3) + 1 THEN PRINT CHR$(34); results3(j - 1); CHR$(34),
        PRINT
    NEXT
    SLEEP
 
NEXT
 
 
 
'notes: REDIM the array(0) to be loaded before calling Split '<<<<<<<<<<<<<<<<<<<<<<< IMPORTANT!!!!
SUB Split (mystr AS STRING, delim AS STRING, arr() AS STRING)
    ' bplus modifications of Galleon fix of Bulrush Split reply #13
    ' http://xmaxw.[abandoned, outdated and now likely malicious qb64 dot net website - don’t go there]/forum/index.php?topic=1612.0
    ' this sub further developed and tested here: \test\Strings\Split test.bas
    DIM copy AS STRING, p AS LONG, curpos AS LONG, arrpos AS LONG, dpos AS LONG
 
    copy = mystr 'make copy since we are messing with mystr when the delimiter is a space
 
    'special case if delim is space, probably want to remove all excess space
    IF delim = " " THEN
        copy = RTRIM$(LTRIM$(copy))
        p = INSTR(copy, "  ")
        WHILE p > 0
            copy = MID$(copy, 1, p - 1) + MID$(copy, p + 1)
            p = INSTR(copy, "  ")
        WEND
    END IF
    curpos = 1
    arrpos = 0
    dpos = INSTR(curpos, copy, delim)
    DO UNTIL dpos = 0
        arr(arrpos) = MID$(copy, curpos, dpos - curpos)
        arrpos = arrpos + 1
        REDIM _PRESERVE arr(arrpos + 1) AS STRING
        curpos = dpos + LEN(delim)
        dpos = INSTR(curpos, copy, delim)
    LOOP
    arr(arrpos) = MID$(copy, curpos)
    REDIM _PRESERVE arr(arrpos) AS STRING 'need this line? YES to get the ubound correct
END SUB
 
 
' Luke 2019-02-15
'Split in$ into pieces, chopping at every occurrence of delimiter$. Multiple consecutive occurrences
'of delimiter$ are treated as a single instance. The chopped pieces are stored in result$().
'
'delimiter$ must be one character long.
'result$() must have been REDIMmed previously.
SUB Lsplit (in$, delimiter$, result$())
    REDIM result$(-1)
    start = 1
    DO
        WHILE MID$(in$, start, 1) = delimiter$
            start = start + 1
            IF start > LEN(in$) THEN EXIT SUB
        WEND
        finish = INSTR(start, in$, delimiter$)
        IF finish = 0 THEN finish = LEN(in$) + 1
        REDIM _PRESERVE result$(0 TO UBOUND(result$) + 1)
        result$(UBOUND(result$)) = MID$(in$, start, finish - start)
        start = finish + 1
    LOOP WHILE start <= LEN(in$)
END SUB
 
'Combine all elements of in$() into a single string with delimiter$ separating the elements.
FUNCTION join$ (in$(), delimiter$)
    result$ = in$(LBOUND(in$))
    FOR i = LBOUND(in$) + 1 TO UBOUND(in$)
        result$ = result$ + delimiter$ + in$(i)
    NEXT i
    join$ = result$
END FUNCTION
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SUB SteveSplit (text$, delimiter$, storage_array() AS STRING)
    STATIC count AS LONG
    count = count + 1
    u = UBOUND(storage_array)
    IF count > u THEN REDIM _PRESERVE storage_array(u + 1000) AS STRING
    i = INSTR(text$, delimiter$)
    IF i THEN
        storage_array(count) = LEFT$(text$, i - 1)
        SteveSplit MID$(text$, i + LEN(delimiter$)), delimiter$, storage_array()
    ELSE
        storage_array(count) = text$
        REDIM _PRESERVE storage_array(count) AS STRING
        count = 0
    END IF
END SUB

On my machine, the test speeds are as follows for:

SteveSplit , Split (bplus), LSplit (Luke)
TEST #0: 0.1650 , 0.4395, 0.3296
TEST #1: 2.3076, 2.3076, 1.5386
TEST #2: 0.4399, 0.5493, 0.5493
TEST #3: 1.0439, 1.2637, 1.5381 (False results)
TEST #4: 1.9229, 2.1431, 2.6924 (False results)

In tests 0, 2, 3, 4, SteveSplit ran fastest.
In test 1, LSplit was the fastest routine.

The difference in the speeds in Test #1 is a general philosophy of *how* we behave with the act of splitting. For my routine, we generate several null strings, delimited by spaces. For everyone else, the extra spaces are simply ignored and removed.

Luke's routine is set to only use a single character as a string delimiter, so for test 3 and 4, it produces false results as we have a 2-character delimiter.

******************

As to WHY my routine tosses out those extra null strings, it's to simply uniformly answer the question of, "How would we behave if if used a FIND AND REPLACE to change all the spaces to commas?"

Instead of a(1) = " test test test ", what results would we expect if a(1) = ",test,test,,,,test," and we used a comma as a delimiter instead of a space? Would we not then count the characters between each comma as as null-string?

I'd think so, and if two commas side-by-side designate a null-string result between them, then I feel like two spaces side-by-side should do the same for space delimited data. This concept also allows me to perfectly reproduce the original data when using a join function, without automatically losing those spaces which may (or may not) have been important for formatting or data purposes.

If I don't want the extra spaces, all I need to do is ignore those null-strings.

bplus · « **Reply #6 on:** February 16, 2019, 09:26:49 am »

Steve:

Quote

The difference in the speeds in Test #1 is a general philosophy of *how* we behave with the act of splitting. For my routine, we generate several null strings, delimited by spaces. For everyone else, the extra spaces are simply ignored and removed.

Luke's routine is set to only use a single character as a string delimiter, so for test 3 and 4, it produces false results as we have a 2-character delimiter.

******************

As to WHY my routine tosses out those extra null strings, it's to simply uniformly answer the question of, "How would we behave if if used a FIND AND REPLACE to change all the spaces to commas?"

Instead of a(1) = " test test test ", what results would we expect if a(1) = ",test,test,,,,test," and we used a comma as a delimiter instead of a space? Would we not then count the characters between each comma as as null-string?

I'd think so, and if two commas side-by-side designate a null-string result between them, then I feel like two spaces side-by-side should do the same for space delimited data. This concept also allows me to perfectly reproduce the original data when using a join function, without automatically losing those spaces which may (or may not) have been important for formatting or data purposes.

If I don't want the extra spaces, all I need to do is ignore those null-strings.

Well in defense of my routine, you have the best of both worlds:
1. If you want to ignore spaces and just use any amount of spaces to separate items then use a space delimiter,
2. otherwise use any other delimiter of any length!

There is a real need for option #1 like when use string a number of numbers together, you won't need to worry about trimming them which would otherwise load the array with unwanted null strings. Having to ignore unwanted null strings would likely pose a needless burden on an app that depends upon the order placement of the items. I have in mind my simple little interpreter that used no punctuation only spaces as a delimiter.

Once you have a good general Split routine working, it would be a snap to modify and optimize for the particular application.
Likely it would need one delimiter or 1 character only delimiters or it may go the n-spaces route. Piece of cake to drop the unneeded general options to speed the thing up for the app. For this reason, Steve your method of REDIM _PRESERVE in large batches of say 1000, instead of at every new item, that is a good idea that I will incorporate in my general Split routine.

SMcNeill · « **Reply #7 on:** February 16, 2019, 10:28:33 am »

Generally, if I want to strip out extra spaces, I’ll just run the string through a find-replace routine to change “ “ to “ “ before running the split routine. It’s a fast process and saves time overall to do the whole process at once, instead of checking/trimming every loop inside the split routine.

And if you notice, my little routine only resizes the array larger when needed, and since REDIM _PRESERVE is a slow process, it’s better to oversized it by 1000 (or more) elements and then resize down when finished, than it is to resize it to the proper size for each word. It’s very rare that I’ll ever REDIM _PRESERVE Array(Limit + 1). REDIM _PRESERVE Array(Limit + 1000) is generally the smallest increment I’ll use in a routine, and then I’ll resize to free unneeded memory when done. ;)

bplus · « **Reply #8 on:** February 16, 2019, 11:21:02 am »

Quote from: SMcNeill on February 16, 2019, 10:28:33 am

Generally, if I want to strip out extra spaces, I’ll just run the string through a find-replace routine to change “ “ to “ “ before running the split routine. It’s a fast process and saves time overall to do the whole process at once, instead of checking/trimming every loop inside the split routine.

And if you notice, my little routine only resizes the array larger when needed, and since REDIM _PRESERVE is a slow process, it’s better to oversized it by 1000 (or more) elements and then resize down when finished, than it is to resize it to the proper size for each word. It’s very rare that I’ll ever REDIM _PRESERVE Array(Limit + 1). REDIM _PRESERVE Array(Limit + 1000) is generally the smallest increment I’ll use in a routine, and then I’ll resize to free unneeded memory when done. ;)

Well Steve, I did strip out all extra spaces (if more than one) before running through the Split code (only if the delimiter was a space) and I did notice increasing the array by chucks instead of by 1 and already mentioned I was going to use that time saver tip!

One more interesting point about your method is that it is recursive which is a favorite technique I like to see employed. But I have heard that recursive techniques aren't as efficient ultimately as non recursive ones ie for every recursive routine there exists a more efficient non recursive one. (The proof of that might be interesting!) If true, then there is a faster version still needing to be revealed. :)

Pete · « **Reply #9 on:** February 16, 2019, 01:17:11 pm »

I have had to trim extra spaces a lot in html parsing. I made something simple years ago that I still use for that. It's so simple, I'll just code it here...

Code: QB64: [Select]

WIDTH 120, 25
a$ = "   This   is a test      of   eliminating     multiple         spaces    in    a text    line.   "
PRINT a$
DO UNTIL INSTR(a$, "  ") = 0
    a$ = MID$(a$, 1, INSTR(a$, "  ") - 1) + MID$(a$, INSTR(a$, "  ") + 1)
LOOP
a$ = RTRIM$(LTRIM$(a$))
PRINT a$
 

I probably should have read previous posts before putting it up. It may have no relevance, but at least it prints text with an apolitical message!

Replacement, splitting, concatenating are all very useful. Good luck with these additions, as I assume this is more stuff for the tool box forum.

Pete

bplus · « **Reply #10 on:** February 17, 2019, 12:44:45 pm »

Ha, ha, ha! Steve pulled a fast one! ;)

With such trivial tests his Split shines but check out a test with 10,000 items to Split and the new Split1000 sub:

Code: QB64: [Select]

'split test.bas for qb64 bplus 2018-05-07
' directly below is Steve's Timed test orig tests commented out
' 2019-02-17 modified by B+ with a new Split1000 sub and a SERIOUS String to Split!
 
'=================================================================== steve;s speed test
CONST ntests = 6
DIM a(ntests - 1) AS STRING, d(ntests - 1) AS STRING
REDIM results1(0) AS STRING
REDIM results2(0) AS STRING
REDIM results3(0) AS STRING
 
CONST Limit = 100
 
'trivial tests just to test accuracy of split
a(0) = ""
d(0) = " "
a(1) = " test test    test " 'good no error!
d(1) = " "
a(2) = " test"
d(2) = " "
a(3) = "3d,z6d,z1 10 #d,z5"
d(3) = ",z"
a(4) = "Monday, , Wednesday, THursday, Friday, , Sunday"
d(4) = ", "
 
'lets get a serious test in here!! test a 10,000 random number string
FOR i = 1 TO 10000
    s$ = s$ + STR$(RND)
NEXT
a(5) = s$
d(5) = " "
 
FOR i = 0 TO ntests - 1
    CLS
    t# = TIMER
    FOR j = 1 TO Limit 'repeat the process multiple times so we can time it.
        SteveSplit a(i), d(i), results1()
    NEXT
    t1# = TIMER
    FOR j = 1 TO Limit 'repeat the process multiple times so we can time it.
        REDIM results2(0) AS STRING
        Split1000 a(i), d(i), results2()
    NEXT
    t2# = TIMER
    FOR j = 1 TO Limit 'repeat the process multiple times so we can time it.
        Lsplit a(i), d(i), results3()
    NEXT
    t3# = TIMER
    PRINT "TEST #"; i; " -- Splitting: "; CHR$(34); MID$(a(i), 1, 80); CHR$(34); " with "; CHR$(34); d(i); CHR$(34)
    PRINT: PRINT "Test names:", "SteveSplit", "Split1000", "Lsplit"
    PRINT "Times:",
    PRINT USING "###.####     ###.####     ###.####"; t1# - t#, t2# - t1#, t3# - t2#
    PRINT: PRINT "First Items in Results arrays (up to 10):"
    FOR j = 1 TO 10
        p = 0
        IF j <= UBOUND(results1) THEN p = 1: PRINT j, CHR$(34); MID$(results1(j), 1, 15); CHR$(34),
        IF j <= UBOUND(results2) + 1 THEN p = 1: PRINT CHR$(34); results2(j - 1); CHR$(34),
        IF j <= UBOUND(results3) + 1 THEN p = 1: PRINT CHR$(34); results3(j - 1); CHR$(34),
        IF p THEN PRINT
    NEXT
    PRINT: INPUT "Press enter for next test... "; wate$
 
NEXT
 
 
'' ================================= My Old Split test Code
'the space delimiter is such a special case perhaps I should develope a single split for that alone?
''2018-08-25 reworked for space delimiters and more variable declares
''2019-02-15 add Luke's version to compare
'ntests = 5
'DIM a(ntests - 1) AS STRING, d(ntests - 1) AS STRING
 
'a(0) = ""
'd(0) = " "
'a(1) = " test test    test " 'good no error!
'd(1) = " "
'a(2) = " test"
'd(2) = " "
'a(3) = "3d,z6d,z1 10 #d,z5"
'd(3) = ",z"
'a(4) = "Monday, , Wednesday, THursday, Friday, , Sunday"
'd(4) = ", "
'REDIM myarr(0) AS STRING '<<<<< REDIM forces the creation of a dynamic/resizable array
'FOR test = 0 TO ntests - 1
'    PRINT: PRINT "splitting {"; a(test); "} with delimeter {"; d(test); "}"
'    Split1000 a(test), d(test), myarr()
'    amax = UBOUND(myarr)
'    FOR i = 0 TO amax
'        PRINT i; ":"; myarr(i)
'    NEXT i
'    INPUT "press enter for next test... "; wate$
'NEXT
 
'' how about a quick file reader test?
'PRINT: INPUT "Press enter for file test, any other + enter quits! "; wate$
'IF LEN(wate$) THEN END
'CLS
 
''other wise continue
'OPEN "Split test.bas" FOR BINARY AS #1 '<<< this file name!!!
'ftext$ = SPACE$(LOF(1))
'GET #1, , ftext$
'CLOSE #1
'Split ftext$, CHR$(13) + CHR$(10), myarr()
'FOR i = 0 TO UBOUND(myarr)
'    PRINT myarr(i)
'    IF i MOD 20 = 19 THEN PRINT: INPUT "press enter for more "; wate$
'NEXT
'PRINT "the end"
'END ' end program
 
 
 
'
'notes: REDIM the array(0) to be loaded before calling Split '<<<<<<<<<<<<<<<<<<<<<<< IMPORTANT!!!!
SUB Split1000 (mystr AS STRING, delim AS STRING, arr() AS STRING)
    ' bplus modifications of Galleon fix of Bulrush Split reply #13
    ' http://xmaxw.[abandoned, outdated and now likely malicious qb64 dot net website - don’t go there]/forum/index.php?topic=1612.0
    ' this sub further developed and tested here: \test\Strings\Split test.bas
    DIM copy AS STRING, p AS LONG, curpos AS LONG, arrpos AS LONG, dpos AS LONG
 
    copy = mystr 'make copy since we are messing with mystr when the delimiter is a space
 
    'special case if delim is space, probably want to remove all excess space
    IF delim = " " THEN
        copy = RTRIM$(LTRIM$(copy))
        p = INSTR(copy, "  ")
        WHILE p > 0
            copy = MID$(copy, 1, p - 1) + MID$(copy, p + 1)
            p = INSTR(copy, "  ")
        WEND
    END IF
    curpos = 1
    arrpos = 0
    dpos = INSTR(curpos, copy, delim)
    DO UNTIL dpos = 0
        arr(arrpos) = MID$(copy, curpos, dpos - curpos)
        arrpos = arrpos + 1
        IF arrpos > UBOUND(arr) THEN REDIM _PRESERVE arr(UBOUND(arr) + 1000) AS STRING
        curpos = dpos + LEN(delim)
        dpos = INSTR(curpos, copy, delim)
    LOOP
    arr(arrpos) = MID$(copy, curpos)
    REDIM _PRESERVE arr(arrpos) AS STRING 'need this line? YES to get the ubound correct
END SUB
 
 
'notes: REDIM the array(0) to be loaded before calling Split '<<<<<<<<<<<<<<<<<<<<<<< IMPORTANT!!!!
SUB Split (mystr AS STRING, delim AS STRING, arr() AS STRING)
    ' bplus modifications of Galleon fix of Bulrush Split reply #13
    ' http://xmaxw.[abandoned, outdated and now likely malicious qb64 dot net website - don’t go there]/forum/index.php?topic=1612.0
    ' this sub further developed and tested here: \test\Strings\Split test.bas
    DIM copy AS STRING, p AS LONG, curpos AS LONG, arrpos AS LONG, dpos AS LONG
 
    copy = mystr 'make copy since we are messing with mystr when the delimiter is a space
 
    'special case if delim is space, probably want to remove all excess space
    IF delim = " " THEN
        copy = RTRIM$(LTRIM$(copy))
        p = INSTR(copy, "  ")
        WHILE p > 0
            copy = MID$(copy, 1, p - 1) + MID$(copy, p + 1)
            p = INSTR(copy, "  ")
        WEND
    END IF
    curpos = 1
    arrpos = 0
    dpos = INSTR(curpos, copy, delim)
    DO UNTIL dpos = 0
        arr(arrpos) = MID$(copy, curpos, dpos - curpos)
        arrpos = arrpos + 1
        REDIM _PRESERVE arr(arrpos + 1000) AS STRING
        curpos = dpos + LEN(delim)
        dpos = INSTR(curpos, copy, delim)
    LOOP
    arr(arrpos) = MID$(copy, curpos)
    REDIM _PRESERVE arr(arrpos) AS STRING 'need this line? YES to get the ubound correct
END SUB
 
 
' Luke 2019-02-15
'Split in$ into pieces, chopping at every occurrence of delimiter$. Multiple consecutive occurrences
'of delimiter$ are treated as a single instance. The chopped pieces are stored in result$().
'
'delimiter$ must be one character long.
'result$() must have been REDIMmed previously.
SUB Lsplit (in$, delimiter$, result$())
    REDIM result$(-1)
    start = 1
    DO
        WHILE MID$(in$, start, 1) = delimiter$
            start = start + 1
            IF start > LEN(in$) THEN EXIT SUB
        WEND
        finish = INSTR(start, in$, delimiter$)
        IF finish = 0 THEN finish = LEN(in$) + 1
        REDIM _PRESERVE result$(0 TO UBOUND(result$) + 1)
        result$(UBOUND(result$)) = MID$(in$, start, finish - start)
        start = finish + 1
    LOOP WHILE start <= LEN(in$)
END SUB
 
'Combine all elements of in$() into a single string with delimiter$ separating the elements.
FUNCTION join$ (in$(), delimiter$)
    result$ = in$(LBOUND(in$))
    FOR i = LBOUND(in$) + 1 TO UBOUND(in$)
        result$ = result$ + delimiter$ + in$(i)
    NEXT i
    join$ = result$
END FUNCTION
 
SUB SteveSplit (text$, delimiter$, storage_array() AS STRING)
    STATIC count AS LONG
    count = count + 1
    u = UBOUND(storage_array)
    IF count > u THEN REDIM _PRESERVE storage_array(u + 1000) AS STRING
    i = INSTR(text$, delimiter$)
    IF i THEN
        storage_array(count) = LEFT$(text$, i - 1)
        SteveSplit MID$(text$, i + LEN(delimiter$)), delimiter$, storage_array()
    ELSE
        storage_array(count) = text$
        REDIM _PRESERVE storage_array(count) AS STRING
        count = 0
    END IF
END SUB
 
 

I knew his recursive method was curse worthy. ;-)))

SMcNeill · « **Reply #11 on:** February 17, 2019, 01:49:10 pm »

25 seconds doesn’t seem right at all. I’ll do some digging to see what’s up later and report back. ;)

SMcNeill · « **Reply #12 on:** February 17, 2019, 02:46:48 pm »

Try this non-recursive version and see how it performs for you:

Code: QB64: [Select]

'split test.bas for qb64 bplus 2018-05-07
' directly below is Steve's Timed test orig tests commented out
' 2019-02-17 modified by B+ with a new Split1000 sub and a SERIOUS String to Split!
 
'=================================================================== steve;s speed test
CONST ntests = 6
DIM a(ntests - 1) AS STRING, d(ntests - 1) AS STRING
REDIM results1(0) AS STRING
REDIM results2(0) AS STRING
REDIM results3(0) AS STRING
 
CONST Limit = 100
CONST NoNull = -1
 
'trivial tests just to test accuracy of split
a(0) = ""
d(0) = " "
a(1) = " test test    test " 'good no error!
d(1) = " "
a(2) = " test"
d(2) = " "
a(3) = "3d,z6d,z1 10 #d,z5"
d(3) = ",z"
a(4) = "Monday, , Wednesday, THursday, Friday, , Sunday"
d(4) = ", "
 
'lets get a serious test in here!! test a 10,000 random number string
FOR i = 1 TO 10000
    s$ = s$ + STR$(RND)
NEXT
a(5) = s$
d(5) = " "
 
FOR i = 0 TO ntests - 1
    CLS
    t# = TIMER
    FOR j = 1 TO Limit 'repeat the process multiple times so we can time it.
        SteveSplit2 a(i), d(i), results1(), NoNull
    NEXT
    t1# = TIMER
    FOR j = 1 TO Limit 'repeat the process multiple times so we can time it.
        REDIM results2(0) AS STRING
        Split1000 a(i), d(i), results2()
    NEXT
    t2# = TIMER
    FOR j = 1 TO Limit 'repeat the process multiple times so we can time it.
        Lsplit a(i), d(i), results3()
    NEXT
    t3# = TIMER
    PRINT "TEST #"; i; " -- Splitting: "; CHR$(34); MID$(a(i), 1, 80); CHR$(34); " with "; CHR$(34); d(i); CHR$(34)
    PRINT: PRINT "Test names:", "SteveSplit", "Split1000", "Lsplit"
    PRINT "Times:",
    PRINT USING "###.####     ###.####     ###.####"; t1# - t#, t2# - t1#, t3# - t2#
    PRINT: PRINT "First Items in Results arrays (up to 10):"
    FOR j = 1 TO 10
        p = 0
        PRINT j,
        IF j <= UBOUND(results1) THEN PRINT CHR$(34); MID$(results1(j), 1, 15); CHR$(34), ELSE PRINT ,
        IF j <= UBOUND(results2) + 1 THEN PRINT CHR$(34); results2(j - 1); CHR$(34), ELSE PRINT ,
        IF j <= UBOUND(results3) + 1 THEN PRINT CHR$(34); results3(j - 1); CHR$(34),
        PRINT
    NEXT
    PRINT: INPUT "Press enter for next test... "; wate$
 
NEXT
 
 
 
'
'notes: REDIM the array(0) to be loaded before calling Split '<<<<<<<<<<<<<<<<<<<<<<< IMPORTANT!!!!
SUB Split1000 (mystr AS STRING, delim AS STRING, arr() AS STRING)
    ' bplus modifications of Galleon fix of Bulrush Split reply #13
    ' http://xmaxw.[abandoned, outdated and now likely malicious qb64 dot net website - don’t go there]/forum/index.php?topic=1612.0
    ' this sub further developed and tested here: \test\Strings\Split test.bas
    DIM copy AS STRING, p AS LONG, curpos AS LONG, arrpos AS LONG, dpos AS LONG
 
    copy = mystr 'make copy since we are messing with mystr when the delimiter is a space
 
    'special case if delim is space, probably want to remove all excess space
    IF delim = " " THEN
        copy = RTRIM$(LTRIM$(copy))
        p = INSTR(copy, "  ")
        WHILE p > 0
            copy = MID$(copy, 1, p - 1) + MID$(copy, p + 1)
            p = INSTR(copy, "  ")
        WEND
    END IF
    curpos = 1
    arrpos = 0
    dpos = INSTR(curpos, copy, delim)
    DO UNTIL dpos = 0
        arr(arrpos) = MID$(copy, curpos, dpos - curpos)
        arrpos = arrpos + 1
        IF arrpos > UBOUND(arr) THEN REDIM _PRESERVE arr(UBOUND(arr) + 1000) AS STRING
        curpos = dpos + LEN(delim)
        dpos = INSTR(curpos, copy, delim)
    LOOP
    arr(arrpos) = MID$(copy, curpos)
    REDIM _PRESERVE arr(arrpos) AS STRING 'need this line? YES to get the ubound correct
END SUB
 
 
'notes: REDIM the array(0) to be loaded before calling Split '<<<<<<<<<<<<<<<<<<<<<<< IMPORTANT!!!!
SUB Split (mystr AS STRING, delim AS STRING, arr() AS STRING)
    ' bplus modifications of Galleon fix of Bulrush Split reply #13
    ' http://xmaxw.[abandoned, outdated and now likely malicious qb64 dot net website - don’t go there]/forum/index.php?topic=1612.0
    ' this sub further developed and tested here: \test\Strings\Split test.bas
    DIM copy AS STRING, p AS LONG, curpos AS LONG, arrpos AS LONG, dpos AS LONG
 
    copy = mystr 'make copy since we are messing with mystr when the delimiter is a space
 
    'special case if delim is space, probably want to remove all excess space
    IF delim = " " THEN
        copy = RTRIM$(LTRIM$(copy))
        p = INSTR(copy, "  ")
        WHILE p > 0
            copy = MID$(copy, 1, p - 1) + MID$(copy, p + 1)
            p = INSTR(copy, "  ")
        WEND
    END IF
    curpos = 1
    arrpos = 0
    dpos = INSTR(curpos, copy, delim)
    DO UNTIL dpos = 0
        arr(arrpos) = MID$(copy, curpos, dpos - curpos)
        arrpos = arrpos + 1
        REDIM _PRESERVE arr(arrpos + 1000) AS STRING
        curpos = dpos + LEN(delim)
        dpos = INSTR(curpos, copy, delim)
    LOOP
    arr(arrpos) = MID$(copy, curpos)
    REDIM _PRESERVE arr(arrpos) AS STRING 'need this line? YES to get the ubound correct
END SUB
 
 
' Luke 2019-02-15
'Split in$ into pieces, chopping at every occurrence of delimiter$. Multiple consecutive occurrences
'of delimiter$ are treated as a single instance. The chopped pieces are stored in result$().
'
'delimiter$ must be one character long.
'result$() must have been REDIMmed previously.
SUB Lsplit (in$, delimiter$, result$())
    REDIM result$(-1)
    start = 1
    DO
        WHILE MID$(in$, start, 1) = delimiter$
            start = start + 1
            IF start > LEN(in$) THEN EXIT SUB
        WEND
        finish = INSTR(start, in$, delimiter$)
        IF finish = 0 THEN finish = LEN(in$) + 1
        REDIM _PRESERVE result$(0 TO UBOUND(result$) + 1)
        result$(UBOUND(result$)) = MID$(in$, start, finish - start)
        start = finish + 1
    LOOP WHILE start <= LEN(in$)
END SUB
 
'Combine all elements of in$() into a single string with delimiter$ separating the elements.
FUNCTION join$ (in$(), delimiter$)
    result$ = in$(LBOUND(in$))
    FOR i = LBOUND(in$) + 1 TO UBOUND(in$)
        result$ = result$ + delimiter$ + in$(i)
    NEXT i
    join$ = result$
END FUNCTION
 
SUB SteveSplit (text$, delimiter$, storage_array() AS STRING)
    STATIC count AS LONG
    count = count + 1
    u = UBOUND(storage_array)
    IF count > u THEN REDIM _PRESERVE storage_array(u + 1000) AS STRING
    i = INSTR(text$, delimiter$)
    IF i THEN
        storage_array(count) = LEFT$(text$, i - 1)
        SteveSplit MID$(text$, i + LEN(delimiter$)), delimiter$, storage_array()
    ELSE
        storage_array(count) = text$
        REDIM _PRESERVE storage_array(count) AS STRING
        count = 0
    END IF
END SUB
 
SUB SteveSplit2 (text$, delimiter$, storage_array() AS STRING, Options AS INTEGER)
    IF Options AND 1 THEN text$ = LTRIM$(text$)
    IF Options AND 2 THEN text$ = RTRIM$(text$)
    count = 1: oldi = 1
    l = LEN(delimiter$)
    u = UBOUND(storage_array)
    IF u < 1 THEN REDIM _PRESERVE storage_array(1000) AS STRING
    DO
        i = INSTR(oldi, text$, delimiter$)
        IF i THEN
            length = i - oldi
            u = UBOUND(storage_array)
            storage_array(count) = MID$(text$, oldi, length)
            IF (Options AND 4) AND (LEN(storage_array(count)) = 0) THEN
                count = count - 1 'remove null-strings.
            END IF
            oldi = i + l
            i = oldi
            count = count + 1
            IF count > u THEN REDIM _PRESERVE storage_array(u + 1000) AS STRING
        END IF
    LOOP UNTIL i = 0
    storage_array(count) = MID$(text$, oldi)
    REDIM _PRESERVE storage_array(count) AS STRING
END SUB
 

Added feature: This now has a simple flag which you can use to make null$ a non-acceptable result for your storage_array, so it'll behave exactly as the other routines. Just change CONST NoNull = -1 to 0 and see how it toggles between the two methods.

(And both are the fastest splitters yet, so now it might not sound so bad to say, "I pulled a fast one." ;D )

((And, if you like keeping the null-strings inside a split routine, as I generally do, you can remove those IF checks and make it even speedier. ))

bplus · « **Reply #13 on:** February 17, 2019, 03:13:10 pm »

Performs great Steve!

They are now running neck and neck on the big test and I agree without the IF check for options, SteveSplit2 should be faster yet.
So that means I might have another fix to speed up Split1000 because for my code, the delimiter check for space is a sort of Option check done before running the main part of the Split code.

bplus · « **Reply #14 on:** February 17, 2019, 04:44:29 pm »

One tiny mod to Split1000 and another test that shows it out paces SteveSplit2 when delimiter isn't a space (which is what slows down Split1000 specially clear now that the limit is increased), probably because of the extra decision about Options in SteveSplit2.

Code: QB64: [Select]

'split test.bas for qb64 bplus 2018-05-07
' directly below is Steve's Timed test orig tests commented out
' 2019-02-17 modified by B+ with a new Split1000 sub and a SERIOUS String to Split!
 
'=================================================================== steve;s speed test
CONST ntests = 7
DIM a(ntests - 1) AS STRING, d(ntests - 1) AS STRING
REDIM results1(0) AS STRING
REDIM results2(0) AS STRING
REDIM results3(0) AS STRING
 
CONST Limit = 1000
CONST NoNull = -1
 
'trivial tests just to test accuracy of split
a(0) = ""
d(0) = " "
a(1) = " test test    test " 'good no error!
d(1) = " "
a(2) = " test"
d(2) = " "
a(3) = "3d,z6d,z1 10 #d,z5"
d(3) = ",z"
a(4) = "Monday, , Wednesday, THursday, Friday, , Sunday"
d(4) = ", "
 
'lets get a serious test in here!! test a 10,000 random number string
FOR i = 1 TO 10000
    s$ = s$ + STR$(RND)
NEXT
a(5) = s$
d(5) = " "
FOR i = 1 TO 10000
    IF i = 1 THEN s$ = STR$(RND * 1000 \ 1) ELSE s$ = s$ + "," + STR$(RND * 1000 \ 1)
NEXT
a(6) = s$
d(6) = ", "
 
 
FOR i = 0 TO ntests - 1
    CLS
    t# = TIMER
    FOR j = 1 TO Limit 'repeat the process multiple times so we can time it.
        SteveSplit2 a(i), d(i), results1(), NoNull
    NEXT
    t1# = TIMER
    FOR j = 1 TO Limit 'repeat the process multiple times so we can time it.
        REDIM results2(0) AS STRING
        Split1000 a(i), d(i), results2()
    NEXT
    t2# = TIMER
    FOR j = 1 TO Limit 'repeat the process multiple times so we can time it.
        Lsplit a(i), d(i), results3()
    NEXT
    t3# = TIMER
    PRINT "TEST #"; i; " -- Splitting: "; CHR$(34); MID$(a(i), 1, 80); CHR$(34); " with "; CHR$(34); d(i); CHR$(34)
    PRINT: PRINT "Test names:", "SteveSplit2", "Split1000", "Lsplit"
    PRINT "Times:",
    PRINT USING "###.####     ###.####     ###.####"; t1# - t#, t2# - t1#, t3# - t2#
    PRINT: PRINT "First Items in Results arrays (up to 10):"
    FOR j = 1 TO 10
        PRINT j,
        IF j <= UBOUND(results1) THEN PRINT CHR$(34); MID$(results1(j), 1, 15); CHR$(34), ELSE PRINT "      ",
        IF j <= UBOUND(results2) + 1 THEN PRINT CHR$(34); results2(j - 1); CHR$(34), ELSE PRINT "      ",
        IF j <= UBOUND(results3) + 1 THEN PRINT CHR$(34); results3(j - 1); CHR$(34), ELSE PRINT "      ",
        PRINT
    NEXT
    PRINT: INPUT "Press enter for next test... "; wate$
 
NEXT
 
 
'' ================================= My Old Split test Code
'the space delimiter is such a special case perhaps I should develope a single split for that alone?
''2018-08-25 reworked for space delimiters and more variable declares
''2019-02-15 add Luke's version to compare
'ntests = 5
'DIM a(ntests - 1) AS STRING, d(ntests - 1) AS STRING
 
'a(0) = ""
'd(0) = " "
'a(1) = " test test    test " 'good no error!
'd(1) = " "
'a(2) = " test"
'd(2) = " "
'a(3) = "3d,z6d,z1 10 #d,z5"
'd(3) = ",z"
'a(4) = "Monday, , Wednesday, THursday, Friday, , Sunday"
'd(4) = ", "
'REDIM myarr(0) AS STRING '<<<<< REDIM forces the creation of a dynamic/resizable array
'FOR test = 0 TO ntests - 1
'    PRINT: PRINT "splitting {"; a(test); "} with delimeter {"; d(test); "}"
'    Split1000 a(test), d(test), myarr()
'    amax = UBOUND(myarr)
'    FOR i = 0 TO amax
'        PRINT i; ":"; myarr(i)
'    NEXT i
'    INPUT "press enter for next test... "; wate$
'NEXT
 
'' how about a quick file reader test?
'PRINT: INPUT "Press enter for file test, any other + enter quits! "; wate$
'IF LEN(wate$) THEN END
'CLS
 
''other wise continue
'OPEN "Split test.bas" FOR BINARY AS #1 '<<< this file name!!!
'ftext$ = SPACE$(LOF(1))
'GET #1, , ftext$
'CLOSE #1
'Split ftext$, CHR$(13) + CHR$(10), myarr()
'FOR i = 0 TO UBOUND(myarr)
'    PRINT myarr(i)
'    IF i MOD 20 = 19 THEN PRINT: INPUT "press enter for more "; wate$
'NEXT
'PRINT "the end"
'END ' end program
 
'
'notes: REDIM the array(0) to be loaded before calling Split '<<<<<<<<<<<<<<<<<<<<<<< IMPORTANT!!!!
SUB Split1000 (mystr AS STRING, delim AS STRING, arr() AS STRING)
    ' bplus modifications of Galleon fix of Bulrush Split reply #13
    ' http://xmaxw.[abandoned, outdated and now likely malicious qb64 dot net website - don’t go there]/forum/index.php?topic=1612.0
    ' this sub further developed and tested here: \test\Strings\Split test.bas
    DIM copy AS STRING, p AS LONG, curpos AS LONG, arrpos AS LONG, dpos AS LONG
 
    copy = mystr 'make copy since we are messing with mystr when the delimiter is a space
 
    'special case if delim is space, probably want to remove all excess space
    IF delim = " " THEN
        copy = RTRIM$(LTRIM$(copy))
        p = INSTR(copy, "  ")
        WHILE p > 0
            copy = MID$(copy, 1, p - 1) + MID$(copy, p + 1)
            p = INSTR(copy, "  ")
        WEND
    END IF
    curpos = 1
    arrpos = 0
    LD = LEN(delim) 'mod
    dpos = INSTR(curpos, copy, delim)
    DO UNTIL dpos = 0
        arr(arrpos) = MID$(copy, curpos, dpos - curpos)
        arrpos = arrpos + 1
        IF arrpos > UBOUND(arr) THEN REDIM _PRESERVE arr(UBOUND(arr) + 1000) AS STRING
        curpos = dpos + LD
        dpos = INSTR(curpos, copy, delim)
    LOOP
    arr(arrpos) = MID$(copy, curpos)
    REDIM _PRESERVE arr(arrpos) AS STRING 'need this line? YES to get the ubound correct
END SUB
 
 
'notes: REDIM the array(0) to be loaded before calling Split '<<<<<<<<<<<<<<<<<<<<<<< IMPORTANT!!!!
SUB Split (mystr AS STRING, delim AS STRING, arr() AS STRING)
    ' bplus modifications of Galleon fix of Bulrush Split reply #13
    ' http://xmaxw.[abandoned, outdated and now likely malicious qb64 dot net website - don’t go there]/forum/index.php?topic=1612.0
    ' this sub further developed and tested here: \test\Strings\Split test.bas
    DIM copy AS STRING, p AS LONG, curpos AS LONG, arrpos AS LONG, dpos AS LONG
 
    copy = mystr 'make copy since we are messing with mystr when the delimiter is a space
 
    'special case if delim is space, probably want to remove all excess space
    IF delim = " " THEN
        copy = RTRIM$(LTRIM$(copy))
        p = INSTR(copy, "  ")
        WHILE p > 0
            copy = MID$(copy, 1, p - 1) + MID$(copy, p + 1)
            p = INSTR(copy, "  ")
        WEND
    END IF
    curpos = 1
    arrpos = 0
    dpos = INSTR(curpos, copy, delim)
    DO UNTIL dpos = 0
        arr(arrpos) = MID$(copy, curpos, dpos - curpos)
        arrpos = arrpos + 1
        REDIM _PRESERVE arr(arrpos + 1000) AS STRING
        curpos = dpos + LEN(delim)
        dpos = INSTR(curpos, copy, delim)
    LOOP
    arr(arrpos) = MID$(copy, curpos)
    REDIM _PRESERVE arr(arrpos) AS STRING 'need this line? YES to get the ubound correct
END SUB
 
 
' Luke 2019-02-15
'Split in$ into pieces, chopping at every occurrence of delimiter$. Multiple consecutive occurrences
'of delimiter$ are treated as a single instance. The chopped pieces are stored in result$().
'
'delimiter$ must be one character long.
'result$() must have been REDIMmed previously.
SUB Lsplit (in$, delimiter$, result$())
    REDIM result$(-1)
    start = 1
    DO
        WHILE MID$(in$, start, 1) = delimiter$
            start = start + 1
            IF start > LEN(in$) THEN EXIT SUB
        WEND
        finish = INSTR(start, in$, delimiter$)
        IF finish = 0 THEN finish = LEN(in$) + 1
        REDIM _PRESERVE result$(0 TO UBOUND(result$) + 1)
        result$(UBOUND(result$)) = MID$(in$, start, finish - start)
        start = finish + 1
    LOOP WHILE start <= LEN(in$)
END SUB
 
'Combine all elements of in$() into a single string with delimiter$ separating the elements.
FUNCTION join$ (in$(), delimiter$)
    result$ = in$(LBOUND(in$))
    FOR i = LBOUND(in$) + 1 TO UBOUND(in$)
        result$ = result$ + delimiter$ + in$(i)
    NEXT i
    join$ = result$
END FUNCTION
 
SUB SteveSplit2 (text$, delimiter$, storage_array() AS STRING, Options AS INTEGER)
    IF Options AND 1 THEN text$ = LTRIM$(text$)
    IF Options AND 2 THEN text$ = RTRIM$(text$)
    count = 1: oldi = 1
    l = LEN(delimiter$)
    u = UBOUND(storage_array)
    IF u < 1 THEN REDIM _PRESERVE storage_array(1000) AS STRING
    DO
        i = INSTR(oldi, text$, delimiter$)
        IF i THEN
            length = i - oldi
            u = UBOUND(storage_array)
            storage_array(count) = MID$(text$, oldi, length)
            IF (Options AND 4) AND (LEN(storage_array(count)) = 0) THEN
                count = count - 1 'remove null-strings.
            END IF
            oldi = i + l
            i = oldi
            count = count + 1
            IF count > u THEN REDIM _PRESERVE storage_array(u + 1000) AS STRING
        END IF
    LOOP UNTIL i = 0
    storage_array(count) = MID$(text$, oldi)
    REDIM _PRESERVE storage_array(count) AS STRING
END SUB
 
SUB SteveSplit (text$, delimiter$, storage_array() AS STRING)
    STATIC count AS LONG
    count = count + 1
    u = UBOUND(storage_array)
    IF count > u THEN REDIM _PRESERVE storage_array(u + 1000) AS STRING
    i = INSTR(text$, delimiter$)
    IF i THEN
        storage_array(count) = LEFT$(text$, i - 1)
        SteveSplit MID$(text$, i + LEN(delimiter$)), delimiter$, storage_array()
    ELSE
        storage_array(count) = text$
        REDIM _PRESERVE storage_array(count) AS STRING
        count = 0
    END IF
END SUB
 
 

News:

Author Topic: Split and join strings (Read 16769 times)

luke

Split and join strings

RhoSigma

Re: Split and join strings

bplus

Re: Split and join strings

RhoSigma

Re: Split and join strings

SMcNeill

Re: Split and join strings

SMcNeill

Re: Split and join strings

bplus

Re: Split and join strings

SMcNeill

Re: Split and join strings

bplus

Re: Split and join strings

Pete

Re: Split and join strings

bplus

Re: Split and join strings

SMcNeill

Re: Split and join strings

SMcNeill

Re: Split and join strings

bplus

Re: Split and join strings

bplus

Re: Split and join strings