Text Only | Text with Attachments

QB64.org Forum

Active Forums => QB64 Discussion => Topic started by: MLambert on March 06, 2021, 04:58:12 am

Title: Double Byte Characters
Post by: MLambert on March 06, 2021, 04:58:12 am: Hi,
Some of my data has Serbian names containing ć which I need to convert to c.
The Hex UTF-8 bytes ( double byte code) is C4 8B, and the decimal is 267. Now I was using C48B$ = Chr$(267) and looking for C487$ in the field.
.... but Chr$ only goes up to 255 ... how can I get around this ?

The code I am using is ... If X$ = C48B$ then X$ = "c"

So long as CHR$ is < 256 then everything is ok but when CHR$ > 255 I get a function error meaning I cannot have CHR$ > 255.

Thanks,

Mike
Title: Re: Double Byte Characters
Post by: luke on March 06, 2021, 06:02:27 am: Quote from: MLambert on March 06, 2021, 04:58:12 am
Some of my data has Serbian names containing ć which I need to convert to c.
This seems like a good way to offend a lot of Serbians.

If you're looking to print such text I recommend using UPrintString from RhoSigma's utility: https://www.qb64.org/forum/index.php?topic=2248.msg130123

Among UTF-8's nice properties is that you can just search for the sequence CHR$(&HC4) + CHR$(&H88) in your full string. It's then up to you to use LEFT/RIGHT/MID as appropriate. You could also use the (potentially) more aesthetically pleasing MKI$(&H8BC4), but note the endian swap.

But I would like to know why you're replacing ć with c, because they're entirely different letters.
Title: Re: Double Byte Characters
Post by: MLambert on March 07, 2021, 02:50:14 am: Thks Luke.

How do I
"search for the sequence CHR$(&HC4) + CHR$(&H88) in your full string" ?

Mike
Title: Re: Double Byte Characters
Post by: luke on March 07, 2021, 06:18:04 am: http://www.qb64.org/wiki/INSTR