Samples Gallery & Reference > Utilities

Versatile String Parsing Function by RhoSigma

(1/1)

Junior Librarian:
Versatile String Parsing Function

Author: @RhoSigma
Source: qb64.org Forum
URL: https://www.qb64.org/forum/index.php?topic=4142.0
Version: 2021-08-27

Author's Description:
I guess every developer is sooner or later in need of such a parsing function: Doesn't matter if it's to split a simple text line into its single words, quickly reading CSV data into an array, break up a path specification into the single folder names or get the individual options of a given command line or of an URL query string.

Obviously such a function must be able to recognize several separator chars and needs to be able to suppress the splitting of components in quoted sections. Special to this function is the ability to optionally use different chars for opening quotes and closing quotes, which e.g. allows to read out sections in parenthesis or brackets.

For usage, see the full description available in separate HTML document (compressed file).


Source Code:

--- Code: QB64: ---         '--- Full description available in separate HTML document.    '---------------------------------------------------------------------    FUNCTION ParseLine& (inpLine$, sepChars$, quoChars$, outArray$(), minUB&)    '--- option _explicit requirements ---    DIM ilen&, icnt&, slen%, s1%, s2%, s3%, s4%, s5%, q1%, q2%    DIM oalb&, oaub&, ocnt&, flag%, ch%, nest%, spos&, epos&    '--- so far return nothing ---    ParseLine& = -1    '--- init & check some runtime variables ---    ilen& = LEN(inpLine$): icnt& = 1    IF ilen& = 0 THEN EXIT FUNCTION    slen% = LEN(sepChars$)    IF slen% > 0 THEN s1% = ASC(sepChars$, 1)    IF slen% > 1 THEN s2% = ASC(sepChars$, 2)    IF slen% > 2 THEN s3% = ASC(sepChars$, 3)    IF slen% > 3 THEN s4% = ASC(sepChars$, 4)    IF slen% > 4 THEN s5% = ASC(sepChars$, 5)    IF slen% > 5 THEN slen% = 5 'max. 5 chars, ignore the rest    IF LEN(quoChars$) > 0 THEN q1% = ASC(quoChars$, 1): ELSE q1% = 34    IF LEN(quoChars$) > 1 THEN q2% = ASC(quoChars$, 2): ELSE q2% = q1%    oalb& = LBOUND(outArray$): oaub& = UBOUND(outArray$): ocnt& = oalb&    '--- skip preceding separators ---    plSkipSepas:    flag% = 0    WHILE icnt& <= ilen& AND NOT flag%        ch% = ASC(inpLine$, icnt&)        SELECT CASE slen%            CASE 0: flag% = -1            CASE 1: flag% = ch% <> s1%            CASE 2: flag% = ch% <> s1% AND ch% <> s2%            CASE 3: flag% = ch% <> s1% AND ch% <> s2% AND ch% <> s3%            CASE 4: flag% = ch% <> s1% AND ch% <> s2% AND ch% <> s3% AND ch% <> s4%            CASE 5: flag% = ch% <> s1% AND ch% <> s2% AND ch% <> s3% AND ch% <> s4% AND ch% <> s5%        END SELECT        icnt& = icnt& + 1    WEND    IF NOT flag% THEN 'nothing else? - then exit        IF ocnt& > oalb& GOTO plEnd        EXIT FUNCTION    END IF    '--- redim to clear array on 1st word/component ---    IF ocnt& = oalb& THEN REDIM outArray$(oalb& TO oaub&)    '--- expand array, if required ---    plNextWord:    IF ocnt& > oaub& THEN        oaub& = oaub& + 10        REDIM _PRESERVE outArray$(oalb& TO oaub&)    END IF    '--- get current word/component until next separator ---    flag% = 0: nest% = 0: spos& = icnt& - 1    WHILE icnt& <= ilen& AND NOT flag%        IF ch% = q1% AND nest% = 0 THEN            nest% = 1        ELSEIF ch% = q1% AND nest% > 0 THEN            nest% = nest% + 1        ELSEIF ch% = q2% AND nest% > 0 THEN            nest% = nest% - 1        END IF        ch% = ASC(inpLine$, icnt&)        SELECT CASE slen%            CASE 0: flag% = (nest% = 0 AND (ch% = q1%)) OR (nest% = 1 AND ch% = q2%)            CASE 1: flag% = (nest% = 0 AND (ch% = s1% OR ch% = q1%)) OR (nest% = 1 AND ch% = q2%)            CASE 2: flag% = (nest% = 0 AND (ch% = s1% OR ch% = s2% OR ch% = q1%)) OR (nest% = 1 AND ch% = q2%)            CASE 3: flag% = (nest% = 0 AND (ch% = s1% OR ch% = s2% OR ch% = s3% OR ch% = q1%)) OR (nest% = 1 AND ch% = q2%)            CASE 4: flag% = (nest% = 0 AND (ch% = s1% OR ch% = s2% OR ch% = s3% OR ch% = s4% OR ch% = q1%)) OR (nest% = 1 AND ch% = q2%)            CASE 5: flag% = (nest% = 0 AND (ch% = s1% OR ch% = s2% OR ch% = s3% OR ch% = s4% OR ch% = s5% OR ch% = q1%)) OR (nest% = 1 AND ch% = q2%)        END SELECT        icnt& = icnt& + 1    WEND    epos& = icnt& - 1    IF ASC(inpLine$, spos&) = q1% THEN spos& = spos& + 1    outArray$(ocnt&) = MID$(inpLine$, spos&, epos& - spos&)    ocnt& = ocnt& + 1    '--- more words/components following? ---    IF flag% AND ch% = q1% AND nest% = 0 GOTO plNextWord    IF flag% GOTO plSkipSepas    IF (ch% <> q1%) AND (ch% <> q2% OR nest% = 0) THEN outArray$(ocnt& - 1) = outArray$(ocnt& - 1) + CHR$(ch%)    '--- final array size adjustment, then exit ---    plEnd:    IF ocnt& - 1 < minUB& THEN ocnt& = minUB& + 1    REDIM _PRESERVE outArray$(oalb& TO (ocnt& - 1))    ParseLine& = ocnt& - 1    END FUNCTION      
Attachments:
  ParseLine.zip   ParseLine.7z

Navigation

[0] Message Index

Go to full version