Author Topic: Unicode ?  (Read 10021 times)

0 Members and 1 Guest are viewing this topic.

Offline Fifi

  • Forum Regular
  • Posts: 181
    • View Profile
    • My small QB64 contribution
Unicode ?
« on: September 11, 2018, 07:18:09 pm »
Hi all,

One thing that really hampers the use of QB64 is that it is limited to use the basic unaccented ASCII table.

As a result, it prohibits its correct use for languages ​​other than English and similarly limits the use of InForm when it's needed to use different set of characters such as like the french specific set : "à â ä é è ê ë ô ö ù û ü î ï ç" (same problem with many other tongues such as german, spanish, sweden, etc.).

So, is there a plan to change that soon?

Thank you.
Fifi
It's better to look like an idiot for a short time while asking something obvious to an expert than pretending to be smart all your life. (C) Me.

Offline Petr

  • Forum Resident
  • Posts: 1720
  • The best code is the DNA of the hops.
    • View Profile
Re: Unicode ?
« Reply #1 on: September 12, 2018, 09:44:37 am »
Hi. Use _MAPUNICODE statement for use unicode characters.

Code: QB64: [Select]
  1. cj 'set ASCII to Czech unicode
  2. _FONT _LOADFONT("cyberbit.ttf", 40, "MONOSPACE")
  3. PRINT "Zdravím tě v českém jazyce!"
  4.  
  5.  
  6.  
  7.  
  8.  
  9.  
  10.  
  11.  
  12.  
  13.  
  14.  
  15.  
  16. SUB cj '                                                                                      Sub make Czech characters readable correctly on the screen. If you needed use this
  17.     '                                                                                         for other language, is possible, you needed others DATA block. Data blocks are
  18.     RESTORE Microsoft_windows_cp1250 '                                                       for more languages in QB64 help (Shift + F1 / Alphabetical index / _MAPUNICODE statement /
  19.     '                                                                                         Code Pages)
  20.     FOR ASCIIcode = 128 TO 255 '                                              
  21.         '                                                                                    
  22.         READ unicode '                                                                
  23.         '                                                                                    
  24.         _MAPUNICODE unicode TO ASCIIcode '                            
  25.  
  26.     NEXT
  27.  
  28.  
  29.  
  30.     EXIT SUB
  31.  
  32.  
  33.  
  34.  
  35.     Microsoft_windows_cp1250:
  36.     DATA 8364,0,8218,0,8222,8230,8224,8225,0,8240,352,8249,346,356,381,377
  37.     DATA 0,8216,8217,8220,8221,8226,8211,8212,0,8482,353,8250,347,357,382,378
  38.     DATA 160,711,728,321,164,260,166,167,168,169,350,171,172,173,174,379
  39.     DATA 176,177,731,322,180,181,182,183,184,261,351,187,317,733,318,380
  40.     DATA 340,193,194,258,196,313,262,199,268,201,280,203,282,205,206,270
  41.     DATA 272,323,327,211,212,336,214,215,344,366,218,368,220,221,354,223
  42.     DATA 341,225,226,259,228,314,263,231,269,233,281,235,283,237,238,271
  43.     DATA 273,324,328,243,244,337,246,247,345,367,250,369,252,253,355,729
  44.  
  45.  
  46.  

more DATA blocks are in ide help for more languages. Search MAPUNICODE and then Code pages
« Last Edit: September 12, 2018, 09:46:01 am by Petr »

FellippeHeitor

  • Guest
Re: Unicode ?
« Reply #2 on: September 12, 2018, 09:54:38 am »
So, is there a plan to change that soon?

It involves some deep C++ tweaking which is not my area. With the other team member actively contributing to the code base being Luke, and with his current busy college routine, don't expect it soon.

InForm has an option in the Edit menu to toggle the code page for a generated form. That may be useful to you to some extent. My programs in Portuguese (none ever shared here, as they're used for work) are all InForm-based and all display/accept characters with diacritic marks.
« Last Edit: September 12, 2018, 09:58:04 am by FellippeHeitor »

Offline Petr

  • Forum Resident
  • Posts: 1720
  • The best code is the DNA of the hops.
    • View Profile
Re: Unicode ?
« Reply #3 on: September 12, 2018, 10:02:21 am »
Here is way for all for using this with "autodetection" for worldwide programs:

1) use PowerShell for determine your locale setting with

Code: QB64: [Select]
  1. SHELL "powershell get-culture > kbd.txt"
  2.  

2) Read kbd.txt and now you know which language is used
3) now use correct unicode DATAs...

Offline TempodiBasic

  • Forum Resident
  • Posts: 1792
    • View Profile
Re: Unicode ?
« Reply #4 on: September 12, 2018, 03:48:02 pm »
Hi Guys

about unicode
we must make difference between two issue...

1) Unicode in IDEQB64
2) Unicode in Program compiled by QB64 (and its C++ translation)

I don't know about what of these Fifi has started to talk.

1) In my point of view if I try to set Option Ide using Language... setting on my country windows_CP1252 as you can see here
https://en.wikipedia.org/wiki/Character_encoding
I get the result that you can see in attachment 1 about QB64ide but it let me write my native language characters...
but why if I emulate Qbasic in Dosbox I have no problem about these characters?


2) also using this setting in IDE of QB64, it lasts the issues about I/O from keyboard and file... I think that the issue is How QB64 IDE and instructions manage information about these codes... in fact as you can see in attachment 2  output on screen is a failure and also the input from keyboard and the file... BUT if you see into file.txt created by program but opened by Notepad.exe you can find the right characters.... so the issue is between QB64 inner structure and OS structure!

Thanks to read...
Programming isn't difficult, only it's  consuming time and coffee

Offline TempodiBasic

  • Forum Resident
  • Posts: 1792
    • View Profile
Re: Unicode ?
« Reply #5 on: September 12, 2018, 03:52:40 pm »
@Petr
using your command in SHELL
Quote
Here is way for all for using this with "autodetection" for worldwide programs:
I get LCID 1040 it-IT

Where can I find 1040 DATAs?
Google gives me so many unuseful links with no table, while if I search on web cp1252  (windows western europe) I get  table with hex value, can I use them directly as DATA or I must convert in decimal?

Thanks
Programming isn't difficult, only it's  consuming time and coffee

Offline Petr

  • Forum Resident
  • Posts: 1720
  • The best code is the DNA of the hops.
    • View Profile
Re: Unicode ?
« Reply #6 on: September 12, 2018, 04:41:01 pm »
Hi, for me it return 1029, CZ. I talk about "CZ" or "IT". I use unicode 1250, what is 1029 or your 1040 i dont know.

Your cp 1252 table is this


Microsoft_windows_cp1252:
DATA 8364,0,8218,402,8222,8230,8224,8225,710,8240,352,8249,338,0,381,0
DATA 0,8216,8217,8220,8221,8226,8211,8212,732,8482,353,8250,339,0,382,376
DATA 160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175
DATA 176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191
DATA 192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207
DATA 208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223
DATA 224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239
DATA 240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255

Code Table 1252 http://en.wikipedia.org/wiki/Windows-1252

                               MS DOS ASC code page differences

                  "ä" is &H84 in CP437, &HE4 in Windows-1252, &HE4 in Unicode.
                  "ö" is &H94 in CP437, &HF6 in Windows-1252, &HF6 in Unicode.

                  "Ă·" is &HF6 in CP437, &HF7 in Windows-1252, &HF7 in Unicode.
                  "ÎŁ" is &HE4 in CP437,                     , &H3A3 in Unicode. 

delete my block data and use this and write here if it works. I copy it from QB64 help.

Offline Bert22306

  • Forum Regular
  • Posts: 206
    • View Profile
Re: Unicode ?
« Reply #7 on: September 12, 2018, 05:19:49 pm »
Fifi, if you can do what Petr suggests, maybe that's your best approach.

Otherwise, for languages like French, Italian, Spanish, and even German to an extent, you can also get by very well, with the default CP437 mapping. For these languages, even German if you ignore the scharfes s (a single character to indicate "ss," which one can also write tout simplement "ss"), then the mapping of characters above 127 is shown here:

https://en.wikipedia.org/wiki/Code_page_437

Scroll down that page, to the table. For the characters above 127, you press <alt> and type the character number from the keypad. For example, <alt>130 is é (e avec accent aigue).

These accented characters show up just fine in the IDE set to the default CP437 and in the QB64 print statements. BUT, if you write the output to a text file, you won't get them to display correctly in Notepad. Notepad assumes Unicode. They will display correctly in MS Word, if, when prompted (open the text file, Word says "encoded text"), you choose MS-DOS.

Seems more cumbersome than it should be, however this approach works with no changes to the default CP437. At least, for the main West Euro languages. Even Portuguese?
« Last Edit: September 12, 2018, 05:22:36 pm by Bert22306 »

Offline TempodiBasic

  • Forum Resident
  • Posts: 1792
    • View Profile
Re: Unicode ?
« Reply #8 on: September 12, 2018, 05:26:03 pm »
Hi Petr

Thanks

1.
Yes I can confirm that using a new font and remapping it with _MAPUNICODE using the right set of characters You solve my issue 2: function about instructions INPUT and PRINT from keyboard and file of txt
Now I must remember to use a new font and remapping in the initialization of all my programs and they can talk italian...
It is simple and fast! Thanks
in the future we can find also how to set inner font of QB64 for local language. Maybe.

2.
You have a more powerful of Help in QB64IDE than mine!
I must go further to Alphabetical to get tables for languages. :-)
Programming isn't difficult, only it's  consuming time and coffee

Offline RhoSigma

  • QB64 Developer
  • Forum Resident
  • Posts: 565
    • View Profile
Re: Unicode ?
« Reply #9 on: September 12, 2018, 05:28:20 pm »
All this is specific for just one language, it is hopeless to make any international reliable programs with QB64. You can switch the IDE to use your native language, but only if you also set a unicode custom font. You can make your program using your native laguage with _MAPUNICODE, but again only with a suitable font set.

You cannot write a program in your native language, and it will magically work in any other language, even if you exchange the unicode DATAs for the other language. This is, as all hardcoded (literal) strings in your program are still in the encoding of your native language, doesn't matter to what other language you change with _MAPUNICODE. The only way this would work, all hardcoded strings must be saved as actually full Unicode letters internally in a QB64 program, than you could use _MAPUNICODE to map that "internal" Unicode to the one or other country's ASCII code by using the correct DATA table. As long as hardcoded strings are always saved as just ASCII encoded with the language the IDE is set to, you will never reach your expacted goal.

From that point _MAPUNICODE is almost useless, you can use it to make your own language working, but that's it.
My Projects:   https://qb64forum.alephc.xyz/index.php?topic=809
GuiTools - A graphic UI framework (can do multiple UI forms/windows in one program)
Libraries - ImageProcess, StringBuffers (virt. files), MD5/SHA2-Hash, LZW etc.
Bonus - Blankers, QB64/Notepad++ setup pack

Offline Bert22306

  • Forum Regular
  • Posts: 206
    • View Profile
Re: Unicode ?
« Reply #10 on: September 12, 2018, 05:40:03 pm »
RhoSigma has a good point, which is not limited to just the use of special characters, though. Ultimately, we need a lingua franca. For the characters problem, we should all use either English or Latin. Problem solved. :)


Offline Bert22306

  • Forum Regular
  • Posts: 206
    • View Profile
Re: Unicode ?
« Reply #12 on: September 12, 2018, 09:26:51 pm »
Ammappete che lingue difficili, Tempo. English is easier, and doesn't need accents.

I think other languages could also get by without accents, and apparently, some accents, like the "accent circonflexe" in French (the hat accent, like ê) are going into disuse. About time. (In case you're curious, that circonflexe accent indicates that at some point, the letter s was dropped from the word used in French. For example, tête in French, testa in Italian. Or forêt in French, foresta in Italian. Fête, festa. Interesting, but not terribly essential!)

We already talked about Italian use of accents, and I'd say also Spanish and Portuguese, where accents are particularly unessential.

I mean come on. In Spanish, who needs an accent for names, like Antonio or Lopez. It's silly. António López. Ma dai! Ti pare, Tempo? How else would you pronounce those names.

It's just a matter of getting used to understanding pronunciation and even meaning, by the context, instead of having it explicitly shown with an accent. Even in English, letter combinations and diphthongs can be pronounced differently, and no one uses accents for this. Like, the letters "gh" are pronounced differently, in "trough" and in "through," and no one seems to care.

In the Internet era, we should start a worldwide campaign to banish useless accents. Rah rah.

« Last Edit: September 12, 2018, 09:39:16 pm by Bert22306 »

Offline Fifi

  • Forum Regular
  • Posts: 181
    • View Profile
    • My small QB64 contribution
Re: Unicode ?
« Reply #13 on: September 12, 2018, 10:53:48 pm »
Hi Bert22306

Thank you for your post.

In the Internet era, we should start a worldwide campaign to banish useless accents. Rah rah.

Sorry but you have it all wrong.

I do not know where you are from, but let me tell you very carefully that it's because of position as yours that American firms like GAFA are hated by the whole world because of their disrespect for culture and languages ​​from other countries.

If one had to follow your reasoning, then English should not be chosen because it is not, by far, the most spoken language in the world.

In this case, it would be better to choose Hindu, Chinese or even Arabic (but you have to learn it, that, for what I know, is not a specialty of english speaking people).

Moreover, since QB64 claims to be compatible with MicroSoft QB4.5, then it really is and supports unicode.

PS: I lived almost 12 years in the USA and I always try to express myself as well as possible in this language which is not my native language and which is also different from that used in England, in Australia , in New Zealand, etc.

Nevertheless, I do not accept that my native language, or others, is distorted by the lack of development of a computer tool.

That said, this friendly remark is also due to the history of QB64 because I think Galleon never thought that his tool would be used around the world.

This is not a reason to camp on this position, especially if you want to see it used massively around the world, which is becoming more and more obvious by the addition of tools like InForm and vWATCH.

Thank you for thinking twice.

Cordially.
Fifi
It's better to look like an idiot for a short time while asking something obvious to an expert than pretending to be smart all your life. (C) Me.

Offline Bert22306

  • Forum Regular
  • Posts: 206
    • View Profile
Re: Unicode ?
« Reply #14 on: September 12, 2018, 11:25:48 pm »
Sorry but you have it all wrong.

I do not know where you are from, but let me tell you very carefully that it's because of position as yours that American firms like GAFA are hated by the whole world because of their disrespect for culture and languages ​​from other countries.

Latin needed no accents. It came multiple centuries before English, Fifi.

Quote
Nevertheless, I do not accept that my native language, or others, is distorted by the lack of development of a computer tool.

It's a historical evolution that has happened other times too. For example, both Vietnam and Turkey, at the turn of the 20th Century, dropped their alphabets and adopted the Roman alphabet. And in French, at least that one accent, la circonflexe, is beginning to disappear in common usage.

None of this is "distortion." It's what happens, to facilitate such things as education or global communications. The Internet being just another medium, a recent one, in global communications.

Anyway, you take this way too seriously, Fifi. It was meant more in jest.