Author Topic: BASE64  (Read 7323 times)

0 Members and 1 Guest are viewing this topic.

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
Re: BASE64
« Reply #15 on: December 23, 2019, 11:46:57 pm »
And you say - 7 bit encoding? Hmmmm :) Is there a standard, something that specifies which characters to use? Definitely, I'll try to do this!

If I was going for 7-bit encoding, I’d go with 45 + 7-bit value, giving you a range of CHR$(45) to CHR$(173), which all copy/paste and display nicely in the QB64 IDE. 

The reason why I’d start at 45?

It’s the next symbol after the comma, and we’d want to avoid the low value control codes, the quote, and the comma, so we won’t mess up our DATA statement.  It also allows for ease of conversion back and forth, without the need for a SELECT CASE table like Base64 uses.  (Just use CHR$ (value + 45) and ASC(value -45) to convert.

I dont think there’s any standard 128-bit encoding which displays nicely in all programs (most would just use 128-bit ANSI encoding, and the first 26 values of it are control codes), so you get to set whatever suits you the best for personal use.  ;)
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
Re: BASE64
« Reply #16 on: December 25, 2019, 06:27:50 am »
And a Base128 version, just in time for Christmas for you Petr, if you're interested in it:

Code: QB64: [Select]
  1. text$ = "ABCDEF"
  2. PRINT "ORIGINAL: "; text$
  3. a$ = B256to128("ABCDEF")
  4. PRINT "BASE128 : "; a$
  5. b$ = B128to256(a$)
  6. PRINT "BASE256 : "; b$
  7.  
  8.  
  9.  
  10. FUNCTION B256to128$ (text$)
  11.     l = 8 * LEN(text$)
  12.     'convert the text to the 8 bit array
  13.     FOR i = 1 TO LEN(text$)
  14.         b = ASC(text$, i)
  15.         p = (i - 1) * 8 + 1
  16.         FOR j = 0 TO 7
  17.             IF b AND (2 ^ j) THEN A(p + j) = 1 ELSE A(p + j) = 0
  18.         NEXT
  19.     NEXT
  20.     'convert the array to 7bit strings
  21.     FOR i = 1 TO l STEP 7
  22.         b = 0
  23.         FOR j = 6 TO 0 STEP -1
  24.             IF i + j < l THEN
  25.                 IF A(i + j) THEN b = b + (2 ^ j)
  26.             END IF
  27.         NEXT
  28.         b = b + 45
  29.         t$ = t$ + CHR$(b)
  30.     NEXT
  31.     B256to128 = t$
  32.  
  33. FUNCTION B128to256$ (text$)
  34.     l = 7 * LEN(text$)
  35.     'convert the text to the 8 bit array
  36.     FOR i = 1 TO LEN(text$)
  37.         b = ASC(text$, i) - 45
  38.         FOR j = 0 TO 6
  39.             p = p + 1: IF p > l THEN EXIT FOR
  40.             IF b AND (2 ^ j) THEN A(p) = 1 ELSE A(p) = 0
  41.         NEXT
  42.     NEXT
  43.     'convert the array to 8bit strings
  44.     p = 0
  45.     FOR i = 1 TO l STEP 8
  46.         b = 0
  47.         FOR j = 0 TO 7
  48.             p = p + 1
  49.             IF p > l THEN EXIT FOR
  50.             IF A(p) THEN b = b + (2 ^ j)
  51.         NEXT
  52.         t$ = t$ + CHR$(b)
  53.     NEXT
  54.     B128to256 = t$

I, myself, might start using this for several things.  The overhead is about as small as we can possibly get it, and still keep formatting that works in a DATA type statement, without causing the IDE any problems.  Hex would double the length of the string, from 6 characters to 12, whereas this only increases by 8/7 * its original size.  :)

(There's probably much faster ways to do this with the new bit shifting routines, but they're not in the official language yet, so I stuck with something simple to work with -- just convert the text to a bit array, and then read the bits as needed/wanted from that array to make our base 128 or base 256 characters.)
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline Petr

  • Forum Resident
  • Posts: 1720
  • The best code is the DNA of the hops.
    • View Profile
Re: BASE64
« Reply #17 on: December 25, 2019, 09:48:13 am »
Thank you Steve, I'll look into it later. I make gifts for children, woman, myself .... I need 6 hands, three legs and 4 heads ... Just a quick look and try - the expected output for 1 byte input, which is also 7 bits or less, is 7 bits, not 2 bytes. Specifically, try to enter "!" [00100001] is 6 bits long, but your function returns 2 bytes. That doesn't seem to be okay.

for 7 bit encoding, 7 bytes of input (8 * 7) = 56 bits are used,
which are written as 8 seven-bit numbers. If the input has a shorter input, the missing bits are filled with zeroes to the left.

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
Re: BASE64
« Reply #18 on: December 25, 2019, 10:37:12 am »
Thank you Steve, I'll look into it later. I make gifts for children, woman, myself .... I need 6 hands, three legs and 4 heads ... Just a quick look and try - the expected output for 1 byte input, which is also 7 bits or less, is 7 bits, not 2 bytes. Specifically, try to enter "!" [00100001] is 6 bits long, but your function returns 2 bytes. That doesn't seem to be okay.

for 7 bit encoding, 7 bytes of input (8 * 7) = 56 bits are used,
which are written as 8 seven-bit numbers. If the input has a shorter input, the missing bits are filled with zeroes to the left.

It’s just a case of making certain all data is preserved.  The most you’d ever save is a single byte, so it doesnt seem like it’s worth the hit to performance to strip off those leading 0’s.

Space is CHR(32), which is &B00100000....   As long as it’s a singular character, we could strip off those leading 0’s and write it as 100000, but if we’re dealing with 2 spaces in a row, we’d have to preserve at least one of them.

00100000,00100000 would translate to: 00,1000000,0100000...  Again, only 1 real byte we can save, from those leading 2 bits in the left byte.

00100000,00100000,00100000 would translate to: 001,0000000,1000000,0100000...  At 3 bytes of spaces, we no longer save that single byte of leftover 0’s, as we have a significant 1 in the last byte.

The most we’d ever have is a single byte of padding, and I’m not that concerned over saving that singular byte. It doesn’t bother me to keep it in there, to keep the conversion process simple and efficient.

A simple string check can remove that leading extra byte, if it’s really bothersome.  If len(converted_text$) MOD 8 <> 0 AND CHR$(RIGHT$(converted_text$)) = 45 THEN ‘strip off that extra character, which is padding of unused 0’s.



Quote
If the input has a shorter input, the missing bits are filled with zeroes to the left.

Either add padding at decode time, or leave padding at encode time — both are perfectly valid means of converting.  I’m just doing the second, rather than the first.  ;)
« Last Edit: December 25, 2019, 10:47:29 am by SMcNeill »
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
Re: BASE64
« Reply #19 on: December 25, 2019, 03:30:24 pm »
Took a few moments to write up the change for you, as  I mentioned above, now that Christmas lunch is over and everyone is just lazing around and nodding off napping.

Code: QB64: [Select]
  1. SCREEN _NEWIMAGE(800, 600, 32)
  2.  
  3. FOR i = 0 TO 25 'Print A to Z, from the alphabet
  4.     text$ = text$ + CHR$(65 + i)
  5.     a$ = B256to128(text$)
  6.     PRINT "B128: "; a$;
  7.     LOCATE , 40
  8.     b$ = B128to256(a$)
  9.     PRINT "B256: "; b$
  10.     SLEEP
  11.  
  12. FUNCTION B256to128$ (text$)
  13.     l = 8 * LEN(text$)
  14.     'convert the text to the 8 bit array
  15.     FOR i = 1 TO LEN(text$)
  16.         b = ASC(text$, i)
  17.         p = (i - 1) * 8 + 1
  18.         FOR j = 0 TO 7
  19.             IF b AND (2 ^ j) THEN A(p + j) = 1 ELSE A(p + j) = 0
  20.         NEXT
  21.     NEXT
  22.     'convert the array to 7bit strings
  23.     FOR i = 1 TO l STEP 7
  24.         b = 0
  25.         FOR j = 6 TO 0 STEP -1
  26.             IF i + j < l THEN
  27.                 IF A(i + j) THEN b = b + (2 ^ j)
  28.             END IF
  29.         NEXT
  30.         b = b + 45
  31.         t$ = t$ + CHR$(b)
  32.     NEXT
  33.     IF LEN(t$) MOD 8 <> 0 AND RIGHT$(t$, 1) = "-" THEN t$ = LEFT$(t$, LEN(t$) - 1)
  34.     B256to128 = t$
  35.  
  36. FUNCTION B128to256$ (text$)
  37.     l = 7 * LEN(text$)
  38.     'convert the text to the 8 bit array
  39.     FOR i = 1 TO LEN(text$)
  40.         b = ASC(text$, i) - 45
  41.         FOR j = 0 TO 6
  42.             p = p + 1: IF p > l THEN EXIT FOR
  43.             IF b AND (2 ^ j) THEN A(p) = 1 ELSE A(p) = 0
  44.         NEXT
  45.     NEXT
  46.     'convert the array to 8bit strings
  47.     p = 0
  48.     FOR i = 1 TO l STEP 8
  49.         b = 0
  50.         FOR j = 0 TO 7
  51.             p = p + 1
  52.             IF p > l THEN EXIT FOR
  53.             IF A(p) THEN b = b + (2 ^ j)
  54.         NEXT
  55.         t$ = t$ + CHR$(b)
  56.     NEXT
  57.     B128to256 = t$
  58.  

The only real change here was the addition of this one line to strip off that extra byte for padding:

Code: [Select]
    IF LEN(t$) MOD 8 <> 0 AND RIGHT$(t$, 1) = "-" THEN t$ = LEFT$(t$, LEN(t$) - 1)
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline Petr

  • Forum Resident
  • Posts: 1720
  • The best code is the DNA of the hops.
    • View Profile
Re: BASE64
« Reply #20 on: December 25, 2019, 04:42:57 pm »
So, I used my previous program, edited it, set it up from CHR$ (40) - and it works.

Here is the first output of Base128. This is just a decoder of encoded content. This code is not copyable by this forum code block.
Is possible, it works not for you, because characters >127 for coding are used. Try this ZIP file, contains BAS and EXE.

* Base128.zip (Filesize: 767.02 KB, Downloads: 125)
« Last Edit: December 25, 2019, 05:12:01 pm by Petr »