Author Topic: Versatile String Parsing Function by RhoSigma  (Read 3087 times)

0 Members and 1 Guest are viewing this topic.

Offline Junior Librarian

  • Moderator
  • Newbie
  • Posts: 19
Versatile String Parsing Function by RhoSigma
« on: September 19, 2021, 04:52:02 am »
Versatile String Parsing Function

Author: @RhoSigma
Source: qb64.org Forum
URL: https://www.qb64.org/forum/index.php?topic=4142.0
Version: 2021-08-27

Author's Description:
I guess every developer is sooner or later in need of such a parsing function: Doesn't matter if it's to split a simple text line into its single words, quickly reading CSV data into an array, break up a path specification into the single folder names or get the individual options of a given command line or of an URL query string.

Obviously such a function must be able to recognize several separator chars and needs to be able to suppress the splitting of components in quoted sections. Special to this function is the ability to optionally use different chars for opening quotes and closing quotes, which e.g. allows to read out sections in parenthesis or brackets.

For usage, see the full description available in separate HTML document (compressed file).



Source Code:
Code: QB64: [Select]
  1.      
  2.     '--- Full description available in separate HTML document.
  3.     '---------------------------------------------------------------------
  4.     FUNCTION ParseLine& (inpLine$, sepChars$, quoChars$, outArray$(), minUB&)
  5.     '--- option _explicit requirements ---
  6.     DIM ilen&, icnt&, slen%, s1%, s2%, s3%, s4%, s5%, q1%, q2%
  7.     DIM oalb&, oaub&, ocnt&, flag%, ch%, nest%, spos&, epos&
  8.     '--- so far return nothing ---
  9.     ParseLine& = -1
  10.     '--- init & check some runtime variables ---
  11.     ilen& = LEN(inpLine$): icnt& = 1
  12.     IF ilen& = 0 THEN EXIT FUNCTION
  13.     slen% = LEN(sepChars$)
  14.     IF slen% > 0 THEN s1% = ASC(sepChars$, 1)
  15.     IF slen% > 1 THEN s2% = ASC(sepChars$, 2)
  16.     IF slen% > 2 THEN s3% = ASC(sepChars$, 3)
  17.     IF slen% > 3 THEN s4% = ASC(sepChars$, 4)
  18.     IF slen% > 4 THEN s5% = ASC(sepChars$, 5)
  19.     IF slen% > 5 THEN slen% = 5 'max. 5 chars, ignore the rest
  20.     IF LEN(quoChars$) > 0 THEN q1% = ASC(quoChars$, 1): ELSE q1% = 34
  21.     IF LEN(quoChars$) > 1 THEN q2% = ASC(quoChars$, 2): ELSE q2% = q1%
  22.     oalb& = LBOUND(outArray$): oaub& = UBOUND(outArray$): ocnt& = oalb&
  23.     '--- skip preceding separators ---
  24.     plSkipSepas:
  25.     flag% = 0
  26.     WHILE icnt& <= ilen& AND NOT flag%
  27.         ch% = ASC(inpLine$, icnt&)
  28.         SELECT CASE slen%
  29.             CASE 0: flag% = -1
  30.             CASE 1: flag% = ch% <> s1%
  31.             CASE 2: flag% = ch% <> s1% AND ch% <> s2%
  32.             CASE 3: flag% = ch% <> s1% AND ch% <> s2% AND ch% <> s3%
  33.             CASE 4: flag% = ch% <> s1% AND ch% <> s2% AND ch% <> s3% AND ch% <> s4%
  34.             CASE 5: flag% = ch% <> s1% AND ch% <> s2% AND ch% <> s3% AND ch% <> s4% AND ch% <> s5%
  35.         END SELECT
  36.         icnt& = icnt& + 1
  37.     WEND
  38.     IF NOT flag% THEN 'nothing else? - then exit
  39.         IF ocnt& > oalb& GOTO plEnd
  40.         EXIT FUNCTION
  41.     END IF
  42.     '--- redim to clear array on 1st word/component ---
  43.     IF ocnt& = oalb& THEN REDIM outArray$(oalb& TO oaub&)
  44.     '--- expand array, if required ---
  45.     plNextWord:
  46.     IF ocnt& > oaub& THEN
  47.         oaub& = oaub& + 10
  48.         REDIM _PRESERVE outArray$(oalb& TO oaub&)
  49.     END IF
  50.     '--- get current word/component until next separator ---
  51.     flag% = 0: nest% = 0: spos& = icnt& - 1
  52.     WHILE icnt& <= ilen& AND NOT flag%
  53.         IF ch% = q1% AND nest% = 0 THEN
  54.             nest% = 1
  55.         ELSEIF ch% = q1% AND nest% > 0 THEN
  56.             nest% = nest% + 1
  57.         ELSEIF ch% = q2% AND nest% > 0 THEN
  58.             nest% = nest% - 1
  59.         END IF
  60.         ch% = ASC(inpLine$, icnt&)
  61.         SELECT CASE slen%
  62.             CASE 0: flag% = (nest% = 0 AND (ch% = q1%)) OR (nest% = 1 AND ch% = q2%)
  63.             CASE 1: flag% = (nest% = 0 AND (ch% = s1% OR ch% = q1%)) OR (nest% = 1 AND ch% = q2%)
  64.             CASE 2: flag% = (nest% = 0 AND (ch% = s1% OR ch% = s2% OR ch% = q1%)) OR (nest% = 1 AND ch% = q2%)
  65.             CASE 3: flag% = (nest% = 0 AND (ch% = s1% OR ch% = s2% OR ch% = s3% OR ch% = q1%)) OR (nest% = 1 AND ch% = q2%)
  66.             CASE 4: flag% = (nest% = 0 AND (ch% = s1% OR ch% = s2% OR ch% = s3% OR ch% = s4% OR ch% = q1%)) OR (nest% = 1 AND ch% = q2%)
  67.             CASE 5: flag% = (nest% = 0 AND (ch% = s1% OR ch% = s2% OR ch% = s3% OR ch% = s4% OR ch% = s5% OR ch% = q1%)) OR (nest% = 1 AND ch% = q2%)
  68.         END SELECT
  69.         icnt& = icnt& + 1
  70.     WEND
  71.     epos& = icnt& - 1
  72.     IF ASC(inpLine$, spos&) = q1% THEN spos& = spos& + 1
  73.     outArray$(ocnt&) = MID$(inpLine$, spos&, epos& - spos&)
  74.     ocnt& = ocnt& + 1
  75.     '--- more words/components following? ---
  76.     IF flag% AND ch% = q1% AND nest% = 0 GOTO plNextWord
  77.     IF flag% GOTO plSkipSepas
  78.     IF (ch% <> q1%) AND (ch% <> q2% OR nest% = 0) THEN outArray$(ocnt& - 1) = outArray$(ocnt& - 1) + CHR$(ch%)
  79.     '--- final array size adjustment, then exit ---
  80.     plEnd:
  81.     IF ocnt& - 1 < minUB& THEN ocnt& = minUB& + 1
  82.     REDIM _PRESERVE outArray$(oalb& TO (ocnt& - 1))
  83.     ParseLine& = ocnt& - 1
  84.      
  85.  

Attachments:
  
« Last Edit: September 25, 2021, 06:22:26 am by Junior Librarian »