Author Topic: Unicode and "name as" = shortcut to death re: more of a unicode problem now.  (Read 2285 times)

0 Members and 1 Guest are viewing this topic.

Offline doppler

  • Forum Regular
  • Posts: 241
    • View Profile
I have a rename program written to handle directories.  Naming them to how I want.  Works as expected.

I never did like Unicode characters in filenames or directories.  And moments ago, it reaffirmed my hate-dread.  Using the "Name as" function on a single directory with only one Unicode character caused my system to go tits-up.  It was almost a complete lock-up.  But to my savoir the power button was able to go to "Turn off".  If it matters the Unicode character was at the end of the directory.

Using 1.4 stable 32bit version.

Outside of looking by eye before running my program.  Is there a way to sense a filename or directory that contains one or more Unicodes ?
« Last Edit: July 26, 2020, 03:19:43 pm by doppler »

Offline Petr

  • Forum Resident
  • Posts: 1720
  • The best code is the DNA of the hops.
    • View Profile
Re: Unicode and "name as" = shortcut to death
« Reply #1 on: July 26, 2020, 09:58:10 am »
You will get the correct file name, even if it contains unicode characters, if you use the DIRENTRY.H function to list files by Steve. It returns a list of files in ANSI encoding and it is compatible with C ++, this is the C ++ way to access files. So. Use direntry.h to get the correct and usable unicode file name.

Part Two of the Problem: How to convert a new unicode file name to ANSI characters so that the file name is valid? It's not necessary, it converts itself, try it:

Code: QB64: [Select]
  1. a$ = "ahoj.txt"
  2. b$ = "nový soubor který obsahuje plno nových znaků s českou diakritikou.txt"
  3. NAME a$ AS b$
  4.  

You can download direntry.h here: https://www.qb64.org/forum/index.php?topic=1712.msg109492#msg109492

Offline bplus

  • Global Moderator
  • Forum Resident
  • Posts: 8053
  • b = b + ...
    • View Profile
Re: Unicode and "name as" = shortcut to death
« Reply #2 on: July 26, 2020, 10:57:05 am »
Warning to @doppler and all following this:

The link Petr provided to get DirEnter.h is not the latest on using it, we hammered out another bug here:
https://www.qb64.org/forum/index.php?topic=2742.msg119842#msg119842   June 26

I am still looking for Steves other post on the subject. This definitely should go in Infomatics Samples!

Offline doppler

  • Forum Regular
  • Posts: 241
    • View Profile
Re: Unicode and "name as" = shortcut to death
« Reply #3 on: July 26, 2020, 11:11:23 am »
Upon further investigation.Your example does work Petr, but having the names as strings does not corrupt the Unicode string.  I tried a$ and b$ as is and reversed.  Worked fine.  The problem comes to light here:  A directory created by Firefox and Winrar exaction.  Vol. 4 Ch. 16.2 - Hero, Officially Employed ②

Using shell (dir and capture) and using Steve's directory.h program the string converts to "Vol. 4 Ch. 16.2 - Hero, Officially Employed ?"  See the problem now.  I never really got an un-corrupted Unicode string to use in rename function.  Which I think highlights another possible problem with rename function.  I will just provide a direct example not using strings (my prog uses):

Rename "Vol. 4 Ch. 16.2 - Hero, Officially Employed ?" as "ch-016-2", my program uses rename a$ as b$.  But it is presented as shown to the system.

Summery:  I am not getting the true Unicode dirfilename.  Using ? in a rename statement is very bad.

Thoughts ?
This could be a real issue for languages other that "ALL" English code page 000.
ps: A reply as I was typing @bplus.  my point is still valid.  I did not use the _cwd function.  And I will use the new and improved directory.h example from now on.
« Last Edit: July 26, 2020, 11:14:35 am by doppler »

Offline bplus

  • Global Moderator
  • Forum Resident
  • Posts: 8053
  • b = b + ...
    • View Profile
Re: Unicode and "name as" = shortcut to death
« Reply #4 on: July 26, 2020, 01:25:59 pm »
Quote
ps: A reply as I was typing @bplus.  my point is still valid.  I did not use the _cwd function.  And I will use the new and improved directory.h example from now on.

Yeah I am wondering if this issue is also a bug with using direnter.h. I will be watching here before doing anything for Samples > Infomatics.

As you noted the previous issue was about getting different directory's.
« Last Edit: July 26, 2020, 01:27:39 pm by bplus »

Offline doppler

  • Forum Regular
  • Posts: 241
    • View Profile
Re: Unicode and "name as" = shortcut to death
« Reply #5 on: July 26, 2020, 03:19:11 pm »
@bplus No your concerns are valid.  The bug in question dealing with _cwd is a function basically getting the directory name very similar to whoami .  And how it terminates in memory strings.  My problem is with unicode characters being converted to ? spaces or underscores, and that depends on the O/S system which one it chooses.  So really using improperly reported directory in a renaming function became the death of me.

For the record directory names and file names are stored in the same way.  Only a bit identifier attribute tells them apart so the O/S knows what to do with it.
So I am as I was 7 hours ago.  Stuck using eyeballs to find unicode dir names.
« Last Edit: July 26, 2020, 03:20:49 pm by doppler »