Author Topic: Simple utility to parse out id and class styles in an html document.  (Read 3378 times)

0 Members and 1 Guest are viewing this topic.

Offline Pete

  • Forum Resident
  • Posts: 2361
  • Cuz I sez so, varmint!
    • View Profile
I've been converting some 40 different html pages to a responsive format, and one of the things I had to make was a couple of new style sheets, one being for mobile screens.  After awhile, all the various id and classes get hard to recall, so I made this little utility to parse out all the id and class tags, used in each html page.

The utility is for Windows systems...

1) Makes a list of all the html pages in the directory: c:\mywebpages
2) parses and collects the id and class styles out of the html and places them into an array.
3) Sorts duplicates out of the array.
4) Writes the results to a file called: css-class.txt.
5) Opens css-class.txt in Notepad.

Anyway, it's easy to change the names of the files, directory, etc.; so, if anyone needs such a utility, well, here it is...

Code: QB64: [Select]
  1. 'WIDTH 80, 43
  2. dir$ = "c:\mywebpages\"
  3. IF _DIREXISTS(dir$) THEN ELSE PRINT "No such directory: "; dir$: END
  4. SHELL _HIDE "dir /b " + dir$ + "*.htm*>tmp.tmp"
  5. 'SHELL _DONTWAIT "notepad tmp.tmp"
  6. IF _FILEEXISTS("css-class.txt") THEN
  7.     LINE INPUT "Overwrite file css-class.txt? (y/n) ", ans$
  8.     IF LCASE$(ans$) = "n" THEN END
  9. OPEN "css-class.txt" FOR OUTPUT AS #3
  10. OPEN "tmp.tmp" FOR INPUT AS #1
  11.     LINE INPUT #1, a$
  12.     IF _FILEEXISTS(dir$ + a$) THEN ELSE PRINT "Error...": END
  13.     OPEN dir$ + a$ FOR BINARY AS #2
  14.     FileLength = LOF(2)
  15.     z$ = SPACE$(FileLength)
  16.     GET #2, 1, z$
  17.     z$ = LCASE$(z$)
  18.     DO UNTIL INSTR(z$, "  ") = 0
  19.         z$ = MID$(z$, 1, INSTR(z$, "  ") - 1) + MID$(z$, INSTR(z$, "  ") + 1) ' Remove one space.
  20.     LOOP
  21.     DO UNTIL INSTR(z$, CHR$(13)) = 0
  22.         z$ = MID$(z$, 1, INSTR(z$, CHR$(13)) - 1) + MID$(z$, INSTR(z$, CHR$(13)) + 1)
  23.     LOOP
  24.     DO UNTIL INSTR(z$, CHR$(10)) = 0
  25.         z$ = MID$(z$, 1, INSTR(z$, CHR$(10)) - 1) + " " + MID$(z$, INSTR(z$, CHR$(10)) + 1)
  26.     LOOP
  27.     x$ = z$
  28.     cnt = 0: ncnt = 0: REDIM s$(1000), style$(1000)
  29.     PRINT a$: PRINT #3, a$
  30.     DO UNTIL INSTR(x$, "id=") = 0
  31.         x$ = MID$(x$, INSTR(x$, "id=") + 4)
  32.         style$ = "#" + MID$(x$, 1, INSTR(x$, CHR$(34)) - 1)
  33.         'PRINT style$
  34.         cnt = cnt + 1: s$(cnt) = style$
  35.     LOOP
  36.     x$ = z$
  37.     DO UNTIL INSTR(x$, "class=") = 0
  38.         x$ = MID$(x$, INSTR(x$, "class=") + 7)
  39.         style$ = "." + MID$(x$, 1, INSTR(x$, CHR$(34)) - 1)
  40.         'PRINT style$
  41.         cnt = cnt + 1: s$(cnt) = style$
  42.     LOOP
  43.     FOR i = 1 TO cnt
  44.         FOR j = 1 TO cnt
  45.             IF i <> j THEN
  46.                 IF s$(i) = s$(j) THEN
  47.                     s$(j) = ""
  48.                 END IF
  49.             END IF
  50.     NEXT j, i
  51.     FOR i = 1 TO cnt
  52.         IF s$(i) <> "" THEN ncnt = ncnt + 1: style$(ncnt) = s$(i)
  53.     NEXT
  54.     FOR i = 1 TO ncnt
  55.         PRINT #3, style$(i)
  56.     NEXT
  57.     CLOSE #2
  58.     PRINT #3, ""
  59. SHELL _DONTWAIT "notepad css-class.txt"
  60.  

Feel free to modify it, expand on it, whatever.

Note that as coded, it lists id results first, followed by class.

Oh, and a big thanks to Amazing Steve for posting that neat load the whole file at once part, some months ago, with the GET statement. That sure beats concatenating with LINE INPUT or using GET with a pre-DIM string, like DIM a as STRING * 1000 and using GET #ff, , until a remainder is all that's left and then dealing with a LOF() math function for that.

Pete
« Last Edit: January 16, 2020, 04:53:19 pm by Pete »
Want to learn how to write code on cave walls? https://www.tapatalk.com/groups/qbasic/qbasic-f1/