Author Topic: Theoretical qustion of speed. AKA: Which process path to follow ??? (Read 7628 times)

doppler · « **on:** May 27, 2019, 07:54:00 am »

In my obsession to read and collect manga (Japanese 2D stories). I always have the problem of names of the media. Not such a big deal when everything is named as image###.ext . The problem starts when only numbers are used. It's nice to read the stuff in sequence and have a known pattern. So I rename everything to ##1, ##2, ##3 ... Here is the problem. How to handle stuff like 1,2,3,4,5,6,7,8,9,10,11,12 or 01,02,03 ... 99,100,101,102. Can you see the problem yet ?
If not I will help. Sorted lists from redirected "shelled "dirs"". Look like this 1,10,11 ... 2,20,21 ... 3, 30, 31. It's even worst 3 digit numbers are in the set along with 1 digit and 2 digit numbers. Right now I have three different extentions needing to renamed. So I am doing it dirty by shelling out to a common bat, that just renames blindly everything. Wither it exsits or not. Slow and very inefficient.

I have 2 paths of process to think about (maybe you might think of a third).

Path #1:
Use filename if exists on all three extensions and rename when found. This is another blind rename process. Test all possibles and rename.

Path #2:
Collect the directory contents into a memory array. Test and rename when found. Intelligent rename. Could be faster.

Pro's and Con's of process paths (calling them #1 and #2).

#1 Pro: No name testing, cached filenames in system call to rename faster on following calls to system.
#1 Con: No name testing, got to do all possibles.
#1 Con: Possible error thrown every time a bad name is tried. ie: rename 1.ext to 001.ext, when 001.ext exists.

#2 Pro: Faster name testing. Only attempt to rename required files.
#2 Con: String searching, parsing and memory array usage.

Task #1 would be less effort, but the result maybe the same as what I am doing now.
Task #2 more time on my part to create. But could still require same time to execute as now. Wasted effort.

OK, either process will get me where I want to go. But which one should I dedicate my time for ?
I don't want to waste my time. That's the reason for this question. Or did you think of the third option yet ?

Ron
P.s. This could be a lively or quick discussion.

bplus · « **Reply #1 on:** May 27, 2019, 09:07:33 am »

Use nameFix$ function to rename file numbers less than 3 characters long:

Code: [Select]

DO
    INPUT "Test a name fix, enter base name "; b$
    PRINT nameFix$(b$)
LOOP

FUNCTION nameFix$ (base$)
    IF LEN(base$) < 3 THEN nameFix$ = RIGHT$("000" + base$, 3) ELSE nameFix$ = base$
END FUNCTION

You will of course have to isolate the critcal section to feed to nameFix$, eg the Path and Extension and File base name?...

Is there really a chance that a file001.ext already exists for file1.ext?
PS I think I am leaning towards Path #2, good exercise to isolate: path, file base name, number and extension use, dang what was that name reverse instr?

Something odd about Code = QB64:
Hey! how did base$ become BASE$ ??? QB64 code thinks it's a keyword?

Code: QB64: [Select]

DO
    INPUT "Test a name fix, enter base name "; b$
    PRINT nameFix$(b$)
LOOP
 
FUNCTION nameFix$ (base$)
    IF LEN(base$) < 3 THEN nameFix$ = RIGHT$("000" + base$, 3) ELSE nameFix$ = base$
END FUNCTION
 

FellippeHeitor · « **Reply #2 on:** May 27, 2019, 09:20:17 am »

BASE is a keyword in OPTION BASE. There's no way to differentiate it for the syntax highlighter.

Pete · « **Reply #3 on:** May 27, 2019, 09:38:03 am »

My take on the unordered shell results would be to change them into arrays, and then rename them. To get them in a true order, I'd do this...

Code: QB64: [Select]

WIDTH 80, 42
_SCREENMOVE 0, 0
REDIM arraynam$(6), arraynum$(6), origarray$(6)
FOR i = 1 TO 6
    READ a$
    origarray$(i) = a$
    PRINT a$
    FOR j = LEN(a$) TO 1 STEP -1
        b$ = MID$(a$, j, 1)
        IF b$ < "0" OR b$ > "9" THEN EXIT FOR
        arraynum$(i) = b$ + arraynum$(i)
    NEXT
    arraynam$(i) = MID$(a$, 1, j)
    arraynum$(i) = STRING$(3 - LEN(arraynum$(i)), "0") + arraynum$(i)
NEXT
PRINT
FOR i = 1 TO 6
    PRINT arraynam$(i); arraynum$(i)
NEXT
PRINT
' Sort...
FOR i = 1 TO 6
    FOR j = 1 TO 6
        IF i <> j THEN
            IF arraynum$(i) < arraynum$(j) THEN
                SWAP arraynum$(i), arraynum$(j)
                SWAP arraynam$(i), arraynam$(j)
                SWAP origarray$(i), origarray$(j)
            END IF
        END IF
    NEXT
NEXT
FOR i = 1 TO 6
    PRINT arraynam$(i); arraynum$(i)
NEXT
PRINT
' Remane demo with NAME AS...
FOR i = 1 TO 6
    PRINT "NAME "; origarray$(i); " AS "; arraynam$(i) + arraynum$(i)
NEXT
 
DATA pete1,pete10,pete11,pete20,pete30,pete9
 

Now for this demo, I used DATA instead of a SHELL call, but you get the idea. Now you are only renaming files you know exist, so the NAME AS QB64 statement could be used, or you could even SHELL out using ren.

As for speed comparisons, I have no clue. I've always felt computers work fast enough, as long as I provide a method and I seldom have tasks that optimizing for speed would impact much. I suppose I'd have to make and test two models, so we'd be in the same boat there, sorry. From first glance, this is just the way I would go about it.

Pete

doppler · « **Reply #4 on:** May 27, 2019, 01:21:55 pm »

Quote from: bplus on May 27, 2019, 09:07:33 am

Is there really a chance that a file001.ext already exists for file1.ext?

Not really and I have seen many postings into the chapters of manga. The kind posters sequence things with ##_orginalfilename.ext
This brings in all tumbler renames into order. Tumbler like Yahoo shortcuts are unique filenames with no particular rhyme or reason.
A hashname or sorts.

The real saving grace is most manga chapters are less than 100 media files in total. There are the rare postings.
Because of the safe to expect starting point. Manga media names are either totally different based on the manga name and chapter. Or just numbers.

I have two programs to handle names. A pre-process program to handle just the name. Which we are discussing now.
And another which let's me view before renaming 001, 002, 003 ... for errors and rename.

All funny names found in a chapter directory get renamed by the second program. If it already 001 ... then I just exit that chapter.
The funny names are just that, too complex to predict and massage.

Nice to see some coding put out. Beyond expectation.
I will look at all of it.

I have always leaned towards process #2. I am not sure all the expanded "C" code would be faster than the blind code of process #1.
What I have now is only fast because of my fast computer. I know the results of removing dumb quick and dirty coding.
I have re-written that stuff only to catch a glimpse of the execution box flash on the screen. Making me think it did not happen.

I wish to do the same for renaming predicament. Based on the 2 people who would know the most about QB64 internals
giving some samples. i am still not sure to the best direction to take.

If you want to try out reading some of this stuff, "Many languages translated" it's on:
https://mangadex.org/ At this moment mangadex is free from AD's and begging. No pop up/unders/arrounds/new pages.
So much so they are targeted for DDOS by others wishing to profit from manga or make MD go away.
I who would not donate to any group or website, have done so for MD upkeep. They are just that good.
BTW, same moniker as here on MD.

doppler · « **Reply #5 on:** May 27, 2019, 09:39:36 pm »

Just for giggles. I tried option #1.

There was 2 error conditions. One unlikely and one likely. For speed I tested only for the likely.
I will let the program crash for the unlikely. From what I have experienced in the past it's only .1% unlikely.

Thanks for the examples submitted.
BTW, for reference using _FILEEXISTS is insanely fast.
Playing with strings may have been faster, but I would have to exert lots of effort to prove it.

I have always believed the programmer who thinks about it before hand is better than, just throwing code
at the problem programmer. So this was not for a waste. It got me to think of another way to try option #1.

SMcNeill · « **Reply #6 on:** May 27, 2019, 09:59:47 pm »

I think the way I’d do something like this would be:

1) Shell DIR to a text file.
2) Read that file into an array.
3) Parse that Array for numeric values. If those values aren’t in ### format, create a new array to store the properly formatted name.
4) When finished, use a DO LOOP to NAME the files into the proper format in one quick batch.

Should be about the fastest way to process your files, I’d think.

You might want to take a look at the little program I wrote to manipulate Ebook titles into a proper format, much like what you’re describing you want to do here: http://qb64.freeforums.net/thread/18/hard-drive-folder-organization-tool

doppler · « **Reply #7 on:** May 28, 2019, 09:44:23 am »

Quote from: SMcNeill on May 27, 2019, 09:59:47 pm

I think the way I’d do something like this would be:

1) Shell DIR to a text file.
2) Read that file into an array.
3) Parse that Array for numeric values. If those values aren’t in ### format, create a new array to store the properly formatted name.
4) When finished, use a DO LOOP to NAME the files into the proper format in one quick batch.

Should be about the fastest way to process your files, I’d think.

You might want to take a look at the little program I wrote to manipulate Ebook titles into a proper format, much like what you’re describing you want to do here: http://qb64.freeforums.net/thread/18/hard-drive-folder-organization-tool

Yes, that would have been process #2. But I am lazy as f**k. Process #1 proved to be insanely fast executing and much easier to create.
It was so insanely fast I have to play notes to prove it was doing something. Even then I barely get a peep out of it before exiting. Play cancels on exit.

I have always seen useful idea's by reading comments to problems. I hope this helps someone else too.

Petr · « **Reply #8 on:** May 28, 2019, 12:07:43 pm »

Hi. If I got it right, this is the primary problem you're asking. Of course, I would do it through a field for all files - just place this to loop. I would sort it out like this:

Code: QB64: [Select]

 
TotalFiles$ = "100000" 'for show waht this code do, try rewrite value 10000 to 1000, 100 or 10. This number specifies the total number of files in the directory
PreviousName$ = "File - 1" '                                                        Original file name used on harddrive returned using DIR, muss contained "-"
Start = INSTR(1, PreviousName$, "-") '                                              The position where the name ends and the number starts
CurrentNumLen = LEN(RIGHT$(PreviousName$, LEN(PreviousName$) - Start)) '            Current lenght for number contained in file name
CurrentNameLen = LEN(PreviousName$) - CurrentNumLen '                               Current lenght for file name
 
LeftSide$ = LEFT$(PreviousName$, CurrentNameLen) '                                  Dividing the file name into its name
RightSide$ = RIGHT$(PreviousName$, CurrentNumLen) '                                                                       and its number
 
NewZeroesLenght = LEN(TotalFiles$) - LEN(RightSide$) '                              Finding the number of zeros by the number of files in the directory
NewName$ = LeftSide$ + " " + STRING$(NewZeroesLenght, "0") + LTRIM$(RightSide$) 'Create new file name
 
PRINT "Original File Name: "; PreviousName$
PRINT "New File Name: "; NewName$
PRINT "Total files:"; VAL(TotalFiles$)

News:

Author Topic: Theoretical qustion of speed. AKA: Which process path to follow ??? (Read 7628 times)

doppler

Theoretical qustion of speed. AKA: Which process path to follow ???

bplus

Re: Theoretical qustion of speed. AKA: Which process path to follow ???

FellippeHeitor

Re: Theoretical qustion of speed. AKA: Which process path to follow ???

Pete

Re: Theoretical qustion of speed. AKA: Which process path to follow ???

doppler

Re: Theoretical qustion of speed. AKA: Which process path to follow ???

doppler

Re: Theoretical qustion of speed. AKA: Which process path to follow ???

SMcNeill

Re: Theoretical qustion of speed. AKA: Which process path to follow ???

doppler

Re: Theoretical qustion of speed. AKA: Which process path to follow ???

Petr

Re: Theoretical qustion of speed. AKA: Which process path to follow ???