Author Topic: Unicode ? (Read 64125 times)

Fifi · « **on:** September 11, 2018, 07:18:09 pm »

Hi all,

One thing that really hampers the use of QB64 is that it is limited to use the basic unaccented ASCII table.

As a result, it prohibits its correct use for languages other than English and similarly limits the use of InForm when it's needed to use different set of characters such as like the french specific set : "à â ä é è ê ë ô ö ù û ü î ï ç" (same problem with many other tongues such as german, spanish, sweden, etc.).

So, is there a plan to change that soon?

Thank you.
Fifi

Petr · « **Reply #1 on:** September 12, 2018, 09:44:37 am »

Hi. Use _MAPUNICODE statement for use unicode characters.

Code: QB64: [Select]

cj 'set ASCII to Czech unicode
_FONT _LOADFONT("cyberbit.ttf", 40, "MONOSPACE")
PRINT "Zdravím tě v českém jazyce!"
 
SUB cj '                                                                                      Sub make Czech characters readable correctly on the screen. If you needed use this
    '                                                                                         for other language, is possible, you needed others DATA block. Data blocks are
    RESTORE Microsoft_windows_cp1250 '                                                       for more languages in QB64 help (Shift + F1 / Alphabetical index / _MAPUNICODE statement /
    '                                                                                         Code Pages)
    FOR ASCIIcode = 128 TO 255 '                                              
        '                                                                                     
        READ unicode '                                                                 
        '                                                                                     
        _MAPUNICODE unicode TO ASCIIcode '                             
 
    NEXT
 
    EXIT SUB
 
    Microsoft_windows_cp1250:
    DATA 8364,0,8218,0,8222,8230,8224,8225,0,8240,352,8249,346,356,381,377
    DATA 0,8216,8217,8220,8221,8226,8211,8212,0,8482,353,8250,347,357,382,378
    DATA 160,711,728,321,164,260,166,167,168,169,350,171,172,173,174,379
    DATA 176,177,731,322,180,181,182,183,184,261,351,187,317,733,318,380
    DATA 340,193,194,258,196,313,262,199,268,201,280,203,282,205,206,270
    DATA 272,323,327,211,212,336,214,215,344,366,218,368,220,221,354,223
    DATA 341,225,226,259,228,314,263,231,269,233,281,235,283,237,238,271
    DATA 273,324,328,243,244,337,246,247,345,367,250,369,252,253,355,729
 
END SUB

more DATA blocks are in ide help for more languages. Search MAPUNICODE and then Code pages

FellippeHeitor · « **Reply #2 on:** September 12, 2018, 09:54:38 am »

Quote from: Fifi on September 11, 2018, 07:18:09 pm

So, is there a plan to change that soon?

It involves some deep C++ tweaking which is not my area. With the other team member actively contributing to the code base being Luke, and with his current busy college routine, don't expect it soon.

InForm has an option in the Edit menu to toggle the code page for a generated form. That may be useful to you to some extent. My programs in Portuguese (none ever shared here, as they're used for work) are all InForm-based and all display/accept characters with diacritic marks.

Petr · « **Reply #3 on:** September 12, 2018, 10:02:21 am »

Here is way for all for using this with "autodetection" for worldwide programs:

1) use PowerShell for determine your locale setting with

Code: QB64: [Select]

SHELL "powershell get-culture > kbd.txt"
 

2) Read kbd.txt and now you know which language is used
3) now use correct unicode DATAs...

TempodiBasic · « **Reply #4 on:** September 12, 2018, 03:48:02 pm »

Hi Guys

about unicode
we must make difference between two issue...

1) Unicode in IDEQB64
2) Unicode in Program compiled by QB64 (and its C++ translation)

I don't know about what of these Fifi has started to talk.

1) In my point of view if I try to set Option Ide using Language... setting on my country windows_CP1252 as you can see here
https://en.wikipedia.org/wiki/Character_encoding
I get the result that you can see in attachment 1 about QB64ide but it let me write my native language characters...
but why if I emulate Qbasic in Dosbox I have no problem about these characters?

2) also using this setting in IDE of QB64, it lasts the issues about I/O from keyboard and file... I think that the issue is How QB64 IDE and instructions manage information about these codes... in fact as you can see in attachment 2 output on screen is a failure and also the input from keyboard and the file... BUT if you see into file.txt created by program but opened by Notepad.exe you can find the right characters.... so the issue is between QB64 inner structure and OS structure!

Thanks to read...

TempodiBasic · « **Reply #5 on:** September 12, 2018, 03:52:40 pm »

@Petr
using your command in SHELL

Quote

Here is way for all for using this with "autodetection" for worldwide programs:

I get LCID 1040 it-IT

Where can I find 1040 DATAs?
Google gives me so many unuseful links with no table, while if I search on web cp1252 (windows western europe) I get table with hex value, can I use them directly as DATA or I must convert in decimal?

Thanks

Petr · « **Reply #6 on:** September 12, 2018, 04:41:01 pm »

Hi, for me it return 1029, CZ. I talk about "CZ" or "IT". I use unicode 1250, what is 1029 or your 1040 i dont know.

Your cp 1252 table is this

Microsoft_windows_cp1252:
DATA 8364,0,8218,402,8222,8230,8224,8225,710,8240,352,8249,338,0,381,0
DATA 0,8216,8217,8220,8221,8226,8211,8212,732,8482,353,8250,339,0,382,376
DATA 160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175
DATA 176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191
DATA 192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207
DATA 208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223
DATA 224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239
DATA 240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255

Code Table 1252 http://en.wikipedia.org/wiki/Windows-1252

MS DOS ASC code page differences

"Ă¤" is &H84 in CP437, &HE4 in Windows-1252, &HE4 in Unicode.
"Ă¶" is &H94 in CP437, &HF6 in Windows-1252, &HF6 in Unicode.

"Ă·" is &HF6 in CP437, &HF7 in Windows-1252, &HF7 in Unicode.
"ÎŁ" is &HE4 in CP437, , &H3A3 in Unicode.

delete my block data and use this and write here if it works. I copy it from QB64 help.

Bert22306 · « **Reply #7 on:** September 12, 2018, 05:19:49 pm »

Fifi, if you can do what Petr suggests, maybe that's your best approach.

Otherwise, for languages like French, Italian, Spanish, and even German to an extent, you can also get by very well, with the default CP437 mapping. For these languages, even German if you ignore the scharfes s (a single character to indicate "ss," which one can also write tout simplement "ss"), then the mapping of characters above 127 is shown here:

https://en.wikipedia.org/wiki/Code_page_437

Scroll down that page, to the table. For the characters above 127, you press <alt> and type the character number from the keypad. For example, <alt>130 is é (e avec accent aigue).

These accented characters show up just fine in the IDE set to the default CP437 and in the QB64 print statements. BUT, if you write the output to a text file, you won't get them to display correctly in Notepad. Notepad assumes Unicode. They will display correctly in MS Word, if, when prompted (open the text file, Word says "encoded text"), you choose MS-DOS.

Seems more cumbersome than it should be, however this approach works with no changes to the default CP437. At least, for the main West Euro languages. Even Portuguese?

TempodiBasic · « **Reply #8 on:** September 12, 2018, 05:26:03 pm »

Hi Petr

Thanks

1.
Yes I can confirm that using a new font and remapping it with _MAPUNICODE using the right set of characters You solve my issue 2: function about instructions INPUT and PRINT from keyboard and file of txt
Now I must remember to use a new font and remapping in the initialization of all my programs and they can talk italian...
It is simple and fast! Thanks
in the future we can find also how to set inner font of QB64 for local language. Maybe.

2.
You have a more powerful of Help in QB64IDE than mine!
I must go further to Alphabetical to get tables for languages. :-)

RhoSigma · « **Reply #9 on:** September 12, 2018, 05:28:20 pm »

All this is specific for just one language, it is hopeless to make any international reliable programs with QB64. You can switch the IDE to use your native language, but only if you also set a unicode custom font. You can make your program using your native laguage with _MAPUNICODE, but again only with a suitable font set.

You cannot write a program in your native language, and it will magically work in any other language, even if you exchange the unicode DATAs for the other language. This is, as all hardcoded (literal) strings in your program are still in the encoding of your native language, doesn't matter to what other language you change with _MAPUNICODE. The only way this would work, all hardcoded strings must be saved as actually full Unicode letters internally in a QB64 program, than you could use _MAPUNICODE to map that "internal" Unicode to the one or other country's ASCII code by using the correct DATA table. As long as hardcoded strings are always saved as just ASCII encoded with the language the IDE is set to, you will never reach your expacted goal.

From that point _MAPUNICODE is almost useless, you can use it to make your own language working, but that's it.

Bert22306 · « **Reply #10 on:** September 12, 2018, 05:40:03 pm »

RhoSigma has a good point, which is not limited to just the use of special characters, though. Ultimately, we need a lingua franca. For the characters problem, we should all use either English or Latin. Problem solved. :)

TempodiBasic · « **Reply #11 on:** September 12, 2018, 06:04:54 pm »

Hi Bert
I have my opinion about what lingua franca is the best....
please all people that are interested to this issue vote choosing among A-B-C

a) https://www.google.it/search?q=Geroglifici+egizi&stick=H4sIAAAAAAAAAONgFuLUz9U3MKwyKctT4gAxU9LKDLX4nPNzc_PzgjNTUssTK4sBY47ZuScAAAA&source=lnms&tbm=isch&sa=X&ved=0ahUKEwjT24ybubbdAhWCy4UKHUpWBKIQ_AUICigB&biw=1366&bih=669#imgrc=ZliW32-5uC8mEM:

b)https://www.google.it/search?q=cuneiforme&source=lnms&tbm=isch&sa=X&ved=0ahUKEwjcj_-mubbdAhUShxoKHZjWDw4Q_AUICigB&biw=1366&bih=669#imgrc=AHIxDmZF3euL0M:

c) https://www.google.it/search?q=sanscrito&source=lnms&tbm=isch&sa=X&ved=0ahUKEwjfkf7uubbdAhVJaBoKHUJiBX4Q_AUICigB&biw=1366&bih=669#imgrc=iMeabL26GLK2oM:

Waiting your opinions

Bert22306 · « **Reply #12 on:** September 12, 2018, 09:26:51 pm »

Ammappete che lingue difficili, Tempo. English is easier, and doesn't need accents.

I think other languages could also get by without accents, and apparently, some accents, like the "accent circonflexe" in French (the hat accent, like ê) are going into disuse. About time. (In case you're curious, that circonflexe accent indicates that at some point, the letter s was dropped from the word used in French. For example, tête in French, testa in Italian. Or forêt in French, foresta in Italian. Fête, festa. Interesting, but not terribly essential!)

We already talked about Italian use of accents, and I'd say also Spanish and Portuguese, where accents are particularly unessential.

I mean come on. In Spanish, who needs an accent for names, like Antonio or Lopez. It's silly. António López. Ma dai! Ti pare, Tempo? How else would you pronounce those names.

It's just a matter of getting used to understanding pronunciation and even meaning, by the context, instead of having it explicitly shown with an accent. Even in English, letter combinations and diphthongs can be pronounced differently, and no one uses accents for this. Like, the letters "gh" are pronounced differently, in "trough" and in "through," and no one seems to care.

In the Internet era, we should start a worldwide campaign to banish useless accents. Rah rah.

Fifi · « **Reply #13 on:** September 12, 2018, 10:53:48 pm »

Hi Bert22306

Thank you for your post.

Quote from: Bert22306 on September 12, 2018, 09:26:51 pm

In the Internet era, we should start a worldwide campaign to banish useless accents. Rah rah.

Sorry but you have it all wrong.

I do not know where you are from, but let me tell you very carefully that it's because of position as yours that American firms like GAFA are hated by the whole world because of their disrespect for culture and languages from other countries.

If one had to follow your reasoning, then English should not be chosen because it is not, by far, the most spoken language in the world.

In this case, it would be better to choose Hindu, Chinese or even Arabic (but you have to learn it, that, for what I know, is not a specialty of english speaking people).

Moreover, since QB64 claims to be compatible with MicroSoft QB4.5, then it really is and supports unicode.

PS: I lived almost 12 years in the USA and I always try to express myself as well as possible in this language which is not my native language and which is also different from that used in England, in Australia , in New Zealand, etc.

Nevertheless, I do not accept that my native language, or others, is distorted by the lack of development of a computer tool.

That said, this friendly remark is also due to the history of QB64 because I think Galleon never thought that his tool would be used around the world.

This is not a reason to camp on this position, especially if you want to see it used massively around the world, which is becoming more and more obvious by the addition of tools like InForm and vWATCH.

Thank you for thinking twice.

Cordially.
Fifi

Bert22306 · « **Reply #14 on:** September 12, 2018, 11:25:48 pm »

Quote from: Fifi on September 12, 2018, 10:53:48 pm

Sorry but you have it all wrong.

I do not know where you are from, but let me tell you very carefully that it's because of position as yours that American firms like GAFA are hated by the whole world because of their disrespect for culture and languages from other countries.

Latin needed no accents. It came multiple centuries before English, Fifi.

Quote

Nevertheless, I do not accept that my native language, or others, is distorted by the lack of development of a computer tool.

It's a historical evolution that has happened other times too. For example, both Vietnam and Turkey, at the turn of the 20th Century, dropped their alphabets and adopted the Roman alphabet. And in French, at least that one accent, la circonflexe, is beginning to disappear in common usage.

None of this is "distortion." It's what happens, to facilitate such things as education or global communications. The Internet being just another medium, a recent one, in global communications.

Anyway, you take this way too seriously, Fifi. It was meant more in jest.

News:

Author Topic: Unicode ? (Read 64125 times)

Fifi

Unicode ?

Petr

Re: Unicode ?

FellippeHeitor

Re: Unicode ?

Petr

Re: Unicode ?

TempodiBasic

Re: Unicode ?

TempodiBasic

Re: Unicode ?

Petr

Re: Unicode ?

Bert22306

Re: Unicode ?

TempodiBasic

Re: Unicode ?

RhoSigma

Re: Unicode ?

Bert22306

Re: Unicode ?

TempodiBasic

Re: Unicode ?

Bert22306

Re: Unicode ?

Fifi

Re: Unicode ?

Bert22306

Re: Unicode ?