Author Topic: Theoretical qustion of speed. AKA: Which process path to follow ???  (Read 3360 times)

0 Members and 1 Guest are viewing this topic.

Offline doppler

  • Forum Regular
  • Posts: 241
    • View Profile
In my obsession to read and collect manga (Japanese  2D stories).  I always have the problem of names of the media.  Not such a big deal when everything is named as image###.ext .  The problem starts when only numbers are used.  It's nice to read the stuff in sequence and have a known pattern.  So I rename everything to ##1, ##2, ##3 ...  Here is the problem.  How to handle stuff like 1,2,3,4,5,6,7,8,9,10,11,12 or 01,02,03 ... 99,100,101,102.  Can you see the problem yet ?
If not I will help.  Sorted lists from redirected "shelled "dirs"".  Look like this 1,10,11 ... 2,20,21 ... 3, 30, 31.  It's even worst 3 digit numbers are in the set along with 1 digit and 2 digit numbers.  Right now I have three different extentions needing to renamed.  So I am doing it dirty by shelling out to a common bat, that just renames blindly everything.  Wither it exsits or not.  Slow and very inefficient.

I have 2 paths of process to think about (maybe you might think of a third).

Path #1:
Use filename if exists on all three extensions  and rename when found.  This is another blind rename process.  Test all possibles and rename.

Path #2:
Collect the directory contents into a memory array.  Test and rename when found.  Intelligent rename.  Could be faster.

Pro's and Con's of process paths (calling them #1 and #2).

#1 Pro: No name testing, cached filenames in system call to rename faster on following calls to system.
#1 Con: No name testing, got to do all possibles.
#1 Con: Possible error thrown every time a bad name is tried.  ie: rename 1.ext to 001.ext, when 001.ext exists.

#2 Pro: Faster name testing.  Only attempt to rename required files.
#2 Con: String searching, parsing and memory array usage.

Task #1 would be less effort, but the result maybe the same as what I am doing now.
Task #2 more time on my part to create.  But could still require same time to execute as now.  Wasted effort.

OK, either process will get me where I want to go.  But which one should I dedicate my time for ?
I don't want to waste my time.  That's the reason for this question.  Or did you think of the third option yet ?

Ron
P.s. This could be a lively or quick discussion.

Offline bplus

  • Global Moderator
  • Forum Resident
  • Posts: 8053
  • b = b + ...
    • View Profile
Use nameFix$ function to rename file numbers less than 3 characters long:
Code: [Select]
DO
    INPUT "Test a name fix, enter base name "; b$
    PRINT nameFix$(b$)
LOOP

FUNCTION nameFix$ (base$)
    IF LEN(base$) < 3 THEN nameFix$ = RIGHT$("000" + base$, 3) ELSE nameFix$ = base$
END FUNCTION

You will of course have to isolate the critcal section to feed to nameFix$, eg the Path and Extension and File base name?...

Is there really a chance that a file001.ext already exists for file1.ext?
PS I think I am leaning towards Path #2, good exercise to isolate: path, file base name, number and extension use, dang what was that name reverse instr?


Something odd about Code = QB64:
Hey! how did base$ become BASE$ ??? QB64 code thinks it's a keyword?
Code: QB64: [Select]
  1.     INPUT "Test a name fix, enter base name "; b$
  2.     PRINT nameFix$(b$)
  3.  
  4. FUNCTION nameFix$ (base$)
  5.     IF LEN(base$) < 3 THEN nameFix$ = RIGHT$("000" + base$, 3) ELSE nameFix$ = base$
  6.  
« Last Edit: May 27, 2019, 09:41:32 am by bplus »

FellippeHeitor

  • Guest
BASE is a keyword in OPTION BASE. There's no way to differentiate it for the syntax highlighter.

Offline Pete

  • Forum Resident
  • Posts: 2361
  • Cuz I sez so, varmint!
    • View Profile
My take on the unordered shell results would be to change them into arrays, and then rename them. To get them in a true order, I'd do this...

Code: QB64: [Select]
  1. WIDTH 80, 42
  2. REDIM arraynam$(6), arraynum$(6), origarray$(6)
  3. FOR i = 1 TO 6
  4.     READ a$
  5.     origarray$(i) = a$
  6.     PRINT a$
  7.     FOR j = LEN(a$) TO 1 STEP -1
  8.         b$ = MID$(a$, j, 1)
  9.         IF b$ < "0" OR b$ > "9" THEN EXIT FOR
  10.         arraynum$(i) = b$ + arraynum$(i)
  11.     NEXT
  12.     arraynam$(i) = MID$(a$, 1, j)
  13.     arraynum$(i) = STRING$(3 - LEN(arraynum$(i)), "0") + arraynum$(i)
  14. FOR i = 1 TO 6
  15.     PRINT arraynam$(i); arraynum$(i)
  16. ' Sort...
  17. FOR i = 1 TO 6
  18.     FOR j = 1 TO 6
  19.         IF i <> j THEN
  20.             IF arraynum$(i) < arraynum$(j) THEN
  21.                 SWAP arraynum$(i), arraynum$(j)
  22.                 SWAP arraynam$(i), arraynam$(j)
  23.                 SWAP origarray$(i), origarray$(j)
  24.             END IF
  25.         END IF
  26.     NEXT
  27. FOR i = 1 TO 6
  28.     PRINT arraynam$(i); arraynum$(i)
  29. ' Remane demo with NAME AS...
  30. FOR i = 1 TO 6
  31.     PRINT "NAME "; origarray$(i); " AS "; arraynam$(i) + arraynum$(i)
  32.  
  33. DATA pete1,pete10,pete11,pete20,pete30,pete9
  34.  

Now for this demo, I used DATA instead of a SHELL call, but you get the idea. Now you are only renaming files you know exist, so the NAME AS QB64 statement could be used, or you could even SHELL out using ren.

As for speed comparisons, I have no clue. I've always felt computers work fast enough, as long as I provide a method and I seldom have tasks that optimizing for speed would impact much. I suppose I'd have to make and test two models, so we'd be in the same boat there, sorry. From first glance, this is just the way I would go about it.

Pete
« Last Edit: May 27, 2019, 09:58:24 am by Pete »
Want to learn how to write code on cave walls? https://www.tapatalk.com/groups/qbasic/qbasic-f1/

Offline doppler

  • Forum Regular
  • Posts: 241
    • View Profile
Is there really a chance that a file001.ext already exists for file1.ext?

Not really and I have seen many postings into the chapters of manga.  The kind posters sequence things with ##_orginalfilename.ext
This brings in all tumbler renames into order.  Tumbler like Yahoo shortcuts are unique filenames with no particular rhyme or reason.
A hashname or sorts.

The real saving grace is most manga chapters are less than 100 media files in total.  There are the rare postings.
Because of the safe to expect starting point.  Manga media names are either totally different based on the manga name and chapter.  Or just numbers.

I have two programs to handle names.  A pre-process program to handle just the name.  Which we are discussing now.
And another which let's me view before renaming 001, 002, 003 ... for errors and rename.

All funny names found in a chapter directory get renamed by the second program.  If it already 001 ... then I just exit that chapter.
The funny names are just that, too complex to predict and massage.

Nice to see some coding put out.  Beyond expectation.
I will look at all of it.

I have always leaned towards process #2.  I am not sure all the expanded "C" code would be faster than the blind code of process #1.
What I have now is only fast because of my fast computer.  I know the results of removing dumb quick and dirty coding.
I have re-written that stuff only to catch a glimpse of the execution box flash on the screen.  Making me think it did not happen.

I wish to do the same for renaming predicament.  Based on the 2 people who would know the most about QB64 internals
giving some samples.  i am still not sure to the best direction to take.

If you want to try out reading some of this stuff, "Many languages translated" it's on:
https://mangadex.org/  At this moment mangadex is free from AD's and begging.  No pop up/unders/arrounds/new pages.
So much so they are targeted for DDOS by others wishing to profit from manga or make MD go away.
I who would not donate to any group or website, have done so for MD upkeep.  They are just that good.
BTW, same moniker as here on MD.


Offline doppler

  • Forum Regular
  • Posts: 241
    • View Profile
Just for giggles.  I tried option #1.

There was 2 error conditions.  One unlikely and one likely.  For speed I tested only for the likely.
I will let the program crash for the unlikely.  From what I have experienced in the past it's only .1% unlikely.

Thanks for the examples submitted.
BTW, for reference using _FILEEXISTS is insanely fast.
Playing with strings may have been faster, but I would have to exert lots of effort to prove it.

I have always believed the programmer who thinks about it before hand is better than, just throwing code
at the problem programmer.  So this was not for a waste.  It got me to think of another way to try option #1.

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
I think the way I’d do something like this would be:

1) Shell DIR to a text file.
2) Read that file into an array.
3) Parse that Array for numeric values.  If those values aren’t in ### format, create a new array to store the properly formatted name.
4) When finished, use a DO LOOP to NAME the files into the proper format in one quick batch.

Should be about the fastest way to process your files, I’d think. 

You might want to take a look at the little program I wrote to manipulate Ebook titles into a proper format, much like what you’re describing you want to do here:  http://qb64.freeforums.net/thread/18/hard-drive-folder-organization-tool

https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline doppler

  • Forum Regular
  • Posts: 241
    • View Profile
I think the way I’d do something like this would be:

1) Shell DIR to a text file.
2) Read that file into an array.
3) Parse that Array for numeric values.  If those values aren’t in ### format, create a new array to store the properly formatted name.
4) When finished, use a DO LOOP to NAME the files into the proper format in one quick batch.

Should be about the fastest way to process your files, I’d think. 

You might want to take a look at the little program I wrote to manipulate Ebook titles into a proper format, much like what you’re describing you want to do here:  http://qb64.freeforums.net/thread/18/hard-drive-folder-organization-tool

Yes, that would have been process #2.  But I am lazy as f**k.  Process #1 proved to be insanely fast executing and much easier to create.
It was so insanely fast I have to play notes to prove it was doing something.  Even then I barely get a peep out of it before exiting.  Play cancels on exit.

I have always seen useful idea's by reading comments to problems.  I hope this helps someone else too.

Offline Petr

  • Forum Resident
  • Posts: 1720
  • The best code is the DNA of the hops.
    • View Profile
Hi.  If I got it right, this is the primary problem you're asking. Of course, I would do it through a field for all files - just place this to loop.  I would sort it out like this:

Code: QB64: [Select]
  1.  
  2.  
  3. TotalFiles$ = "100000" 'for show waht this code do, try rewrite value 10000 to 1000, 100 or 10. This number specifies the total number of files in the directory
  4. PreviousName$ = "File - 1" '                                                        Original file name used on harddrive returned using DIR, muss contained "-"
  5. Start = INSTR(1, PreviousName$, "-") '                                              The position where the name ends and the number starts
  6. CurrentNumLen = LEN(RIGHT$(PreviousName$, LEN(PreviousName$) - Start)) '            Current lenght for number contained in file name
  7. CurrentNameLen = LEN(PreviousName$) - CurrentNumLen '                               Current lenght for file name
  8.  
  9. LeftSide$ = LEFT$(PreviousName$, CurrentNameLen) '                                  Dividing the file name into its name
  10. RightSide$ = RIGHT$(PreviousName$, CurrentNumLen) '                                                                       and its number
  11.  
  12. NewZeroesLenght = LEN(TotalFiles$) - LEN(RightSide$) '                              Finding the number of zeros by the number of files in the directory
  13. NewName$ = LeftSide$ + " " + STRING$(NewZeroesLenght, "0") + LTRIM$(RightSide$) 'Create new file name
  14.  
  15.  
  16.  
  17. PRINT "Original File Name: "; PreviousName$
  18. PRINT "New File Name: "; NewName$
  19. PRINT "Total files:"; VAL(TotalFiles$)
  20.  
  21.