Author Topic: Simple utility to parse out id and class styles in an html document. (Read 5299 times)

Pete · « **on:** January 16, 2020, 03:30:46 pm »

I've been converting some 40 different html pages to a responsive format, and one of the things I had to make was a couple of new style sheets, one being for mobile screens. After awhile, all the various id and classes get hard to recall, so I made this little utility to parse out all the id and class tags, used in each html page.

The utility is for Windows systems...

1) Makes a list of all the html pages in the directory: c:\mywebpages
2) parses and collects the id and class styles out of the html and places them into an array.
3) Sorts duplicates out of the array.
4) Writes the results to a file called: css-class.txt.
5) Opens css-class.txt in Notepad.

Anyway, it's easy to change the names of the files, directory, etc.; so, if anyone needs such a utility, well, here it is...

Code: QB64: [Select]

_SCREENMOVE 0, 0
'WIDTH 80, 43
dir$ = "c:\mywebpages\"
IF _DIREXISTS(dir$) THEN ELSE PRINT "No such directory: "; dir$: END
SHELL _HIDE "dir /b " + dir$ + "*.htm*>tmp.tmp"
'SHELL _DONTWAIT "notepad tmp.tmp"
IF _FILEEXISTS("css-class.txt") THEN
    LINE INPUT "Overwrite file css-class.txt? (y/n) ", ans$
    IF LCASE$(ans$) = "n" THEN END
END IF
OPEN "css-class.txt" FOR OUTPUT AS #3
OPEN "tmp.tmp" FOR INPUT AS #1
DO UNTIL EOF(1)
    LINE INPUT #1, a$
    IF _FILEEXISTS(dir$ + a$) THEN ELSE PRINT "Error...": END
    OPEN dir$ + a$ FOR BINARY AS #2
    FileLength = LOF(2)
    z$ = SPACE$(FileLength)
    GET #2, 1, z$
    z$ = LCASE$(z$)
    DO UNTIL INSTR(z$, "  ") = 0
        z$ = MID$(z$, 1, INSTR(z$, "  ") - 1) + MID$(z$, INSTR(z$, "  ") + 1) ' Remove one space.
    LOOP
    DO UNTIL INSTR(z$, CHR$(13)) = 0
        z$ = MID$(z$, 1, INSTR(z$, CHR$(13)) - 1) + MID$(z$, INSTR(z$, CHR$(13)) + 1)
    LOOP
    DO UNTIL INSTR(z$, CHR$(10)) = 0
        z$ = MID$(z$, 1, INSTR(z$, CHR$(10)) - 1) + " " + MID$(z$, INSTR(z$, CHR$(10)) + 1)
    LOOP
    x$ = z$
    cnt = 0: ncnt = 0: REDIM s$(1000), style$(1000)
    PRINT a$: PRINT #3, a$
    DO UNTIL INSTR(x$, "id=") = 0
        x$ = MID$(x$, INSTR(x$, "id=") + 4)
        style$ = "#" + MID$(x$, 1, INSTR(x$, CHR$(34)) - 1)
        'PRINT style$
        cnt = cnt + 1: s$(cnt) = style$
    LOOP
    x$ = z$
    DO UNTIL INSTR(x$, "class=") = 0
        x$ = MID$(x$, INSTR(x$, "class=") + 7)
        style$ = "." + MID$(x$, 1, INSTR(x$, CHR$(34)) - 1)
        'PRINT style$
        cnt = cnt + 1: s$(cnt) = style$
    LOOP
    FOR i = 1 TO cnt
        FOR j = 1 TO cnt
            IF i <> j THEN
                IF s$(i) = s$(j) THEN
                    s$(j) = ""
                END IF
            END IF
    NEXT j, i
    FOR i = 1 TO cnt
        IF s$(i) <> "" THEN ncnt = ncnt + 1: style$(ncnt) = s$(i)
    NEXT
    FOR i = 1 TO ncnt
        PRINT #3, style$(i)
    NEXT
    CLOSE #2
    PRINT #3, ""
LOOP
SHELL _DONTWAIT "notepad css-class.txt"
 

Feel free to modify it, expand on it, whatever.

Note that as coded, it lists id results first, followed by class.

Oh, and a big thanks to Amazing Steve for posting that neat load the whole file at once part, some months ago, with the GET statement. That sure beats concatenating with LINE INPUT or using GET with a pre-DIM string, like DIM a as STRING * 1000 and using GET #ff, , until a remainder is all that's left and then dealing with a LOF() math function for that.

Pete

News:

Author Topic: Simple utility to parse out id and class styles in an html document. (Read 5299 times)

Pete

Simple utility to parse out id and class styles in an html document.