Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - Unma

Pages: [1]
1
Programs / Re: Print Unicode, UTF-8
« on: April 06, 2020, 08:55:59 am »
Just to join the club, I also have QB64 function that converts UTF-8 strings into UNICODE number.

Not knowing for Your efforts, I have done this as aid to (hopefully automated) translating among many languages.
Once again I did something needless :-)

It is written in such a way that after 8 years, even I can understand what I was doing today. And that is not always the case :-)
Well, it my be helpful to somebody unfamiliar with UNICODE standard.

Code: QB64: [Select]
  1. '------------------------------
  2. ' INPUT:  UTF-8 string
  3. '------------------------------
  4. ' OUTPUT: ERROR   UTF2UNICODE < 0
  5. '         ---------------------
  6. '         OK      UTF2UNICODE => 0 AND UTF2UNICODE =< &H10FFFF
  7. '                 Recoginsed unicode character is removed from the begining of argument. This can be turned off (see at the bottom).
  8. '------------------------------
  9. FUNCTION UTF2UNICODE& (txt$)
  10.     DIM chlen AS INTEGER
  11.     DIM hb AS LONG
  12.     DIM db AS LONG
  13.     DIM result AS LONG
  14.  
  15.     IF LEN(txt$) = 0 THEN
  16.         UTF2UNICODE& = -1 'Invalid argument
  17.         EXIT FUNCTION
  18.     END IF
  19.     result = -2 'Unspecified error
  20.     chlen = 1
  21.     hb = ASC(txt$) 'head-byte only
  22.  
  23.     IF (hb AND &B10000000) = 0 THEN ' ? 0xxx xxxx  TRUE=byte is ASCII character
  24.         result = hb
  25.     ELSE
  26.         IF (hb AND &B11100000) = &B11000000 THEN ' ? 110x xxxx  TRUE=byte is 1st of two bytes
  27.             'head-byte + data-byte
  28.             ' 110xxxxx   10yyyyyy
  29.             '---------------------
  30.             chlen = 2
  31.             'result = (hb AND &B00011111) * &B01000000
  32.             result = (hb AND &H1F) * &H40 '            head-byte  shifted left 6 places  result | 0000 0000  0000 0000  0000 0xxx  xx00 0000 |
  33.             db = ASC(MID$(txt$, 2, 1)) '                     data-byte
  34.             'result = result OR (db AND &B00111111)
  35.             result = result OR (db AND &H3F) '              data-byte  copied                 result | 0000 0000  0000 0000  0000 0xxx  xxyy yyyy |
  36.         ELSE
  37.             IF (hb AND &B11110000) = &B11100000 THEN ' ? 1110 xxxx  TRUE=byte is 1st of 3 bytes
  38.                 'head-byte + data-byte1 + data-byte2
  39.                 ' 1110xxxx   10yyyyyy     10zzzzzz
  40.                 '-----------------------------------
  41.                 chlen = 3
  42.                 'result = (hb AND &B00001111) * &B 0001 0000 0000 0000
  43.                 result = (hb AND &HF) * &H1000 '               head-byte   shifted left 12 places  result | 0000 0000  0000 0000  xxxx 0000  0000 0000 |
  44.                 db = ASC(MID$(txt$, 2, 1)) '                     data-byte1
  45.                 result = result OR ((db AND &H3F) * &H40) '  data-byte1  shifted left 6 places   result | 0000 0000  0000 0000  xxxx yyyy  yy00 0000 |
  46.                 db = ASC(MID$(txt$, 3, 1)) '                     data-byte2
  47.                 result = result OR (db AND &H3F) '           data-byte2  copied                  result | 0000 0000  0000 0000  xxxx yyyy  yyzz zzzz |
  48.             ELSE
  49.                 IF (hb AND &B11111000) = &B11110000 THEN ' ? 1111 0xxx  TRUE=byte is 1st of 4
  50.                     'head-byte + data-byte1 + data-byte2 + data-byte3
  51.                     ' 11110xxx   10yyyyyy     10zzzzzz     10wwwwww
  52.                     '------------------------------------------------
  53.                     chlen = 4
  54.                     'result = (hb AND &B00000111) * &B 0000 0100  0000 0000  0000 0000
  55.                     result = (hb AND &H6) * &H400000 '             head-byte   shifted left 18 places  result | 0000 0000  000x xx00  0000 0000  0000 0000 |
  56.                     db = ASC(MID$(txt$, 2, 1)) '                      data-byte1
  57.                     result = result OR ((db AND &H3F) * &H1000) ' data-byte1  shifted left 12 places  result | 0000 0000  000x xxyy  yyyy 0000  0000 0000 |
  58.                     db = ASC(MID$(txt$, 3, 1)) '                      data-byte2
  59.                     result = result OR ((db AND &H3F) * &H40) '   data-byte2  shifted left 6 places   result | 0000 0000  000x xxyy  yyyy zzzz  zz00 0000 |
  60.                     db = ASC(MID$(txt$, 4, 1)) '                      data-byte3
  61.                     result = result OR (db AND &H3F) '           data-byte3  copied                  result | 0000 0000  000x xxyy  yyyy zzzz  zzww wwww |
  62.                 ELSE
  63.                     'Not a head-byte.
  64.                     result = hb
  65.                 END IF
  66.             END IF
  67.         END IF
  68.     END IF
  69.     IF chlen < LEN(txt$) THEN txt$ = MID$(txt$, chlen + 1) ELSE txt$ = "" ' By commenting this line, function will leave string-argument unchanged.
  70.     UTF2UNICODE& = result
  71.  


I intend to ad some more "error codes" (negative returns) at some later point. Right now I have to deal another issue.
It is obvious that there is a bad blood between QB64 IDE and my OS (Linux Mint).
By

Pages: [1]