Author Topic: Variable-length strings in TYPEs  (Read 10661 times)

0 Members and 1 Guest are viewing this topic.

Offline luke

  • Administrator
  • Seasoned Forum Regular
  • Posts: 324
    • View Profile
Variable-length strings in TYPEs
« on: October 23, 2018, 06:47:31 am »
Q: How long is a piece of string?
A: As long as it needs to be.

Soon you'll be able to do something like this:
Code: QB64: [Select]
  1.     a AS INTEGER
  2.     b AS STRING
  3.  
  4. DIM a as t
  5. DIM b(20) AS t
  6.  
That is, you'll be able to have strings of variable length inside a TYPE, and have arrays of them and do all the usual stuff you'd expect with TYPE's. However, GET and PUT (as in to write to a binary file) don't natively make sense, because they rely on knowing how many bytes makes up the TYPE - now they are variable in size.So, should we:
 - Disallow using GET/PUT to a binary file with these variable length TYPE's?
 - Also write out length information that allows the data to be read back in correctly?
 - Something else?

I'm open to ideas.

Offline RhoSigma

  • QB64 Developer
  • Forum Resident
  • Posts: 565
    • View Profile
Re: Variable-length strings in TYPEs
« Reply #1 on: October 23, 2018, 08:29:07 am »
How would be an approch like in the very old language BCPL, which put the String's length into the 1st string byte?

Of course just a byte would limit us to max. 255 chars + the 1st length byte = a 256 bytes memory chunk, which is a nice value because it's a base 2 power.

In QB64 we would need a LONG to store the length as of the variable string length limit in QB64, so our string would be like MKL$(length) + String.
However, this must be handled internally somehow, to not bother the user with the need to manually skip that 4-Bytes length marker.

If it's just for the overall length of the entire TYPE, to be able to use PUT/GET on it, an IFF file appoch would be my choice: https://en.wikipedia.org/wiki/Interchange_File_Format
My Projects:   https://qb64forum.alephc.xyz/index.php?topic=809
GuiTools - A graphic UI framework (can do multiple UI forms/windows in one program)
Libraries - ImageProcess, StringBuffers (virt. files), MD5/SHA2-Hash, LZW etc.
Bonus - Blankers, QB64/Notepad++ setup pack

Offline Petr

  • Forum Resident
  • Posts: 1720
  • The best code is the DNA of the hops.
    • View Profile
Re: Variable-length strings in TYPEs
« Reply #2 on: October 23, 2018, 10:16:44 am »
Hi.

Remain this format "STRING * count" remain in binary files? After editing, will this work?  B$ = SPACE $(1000): GET file ,, B$?
Will it be compatible with _MEM at all? (will I be able to load such type of field by this function)?
Will LEN after call LEN (Array.String) return this string lenght?
If you disable STRING * value for writing to file with PUT or reading with GET , this is not good.
I would prefer it to be possible to use both entries in TYPE. Both firmly defined and this new one.

FellippeHeitor

  • Guest
Re: Variable-length strings in TYPEs
« Reply #3 on: October 23, 2018, 10:38:04 am »
New functionality won't replace existing functionality. It's an alternative use.

Offline Petr

  • Forum Resident
  • Posts: 1720
  • The best code is the DNA of the hops.
    • View Profile
Re: Variable-length strings in TYPEs
« Reply #4 on: October 23, 2018, 10:56:43 am »
Thank you Fellippe for clarification. I wish you success in working on this difficult matter.

FellippeHeitor

  • Guest
Re: Variable-length strings in TYPEs
« Reply #5 on: October 23, 2018, 11:12:17 am »
This is Luke's work, let's all thank him (and be thankful that uni life has its breaks!)

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
Re: Variable-length strings in TYPEs
« Reply #6 on: October 23, 2018, 12:21:07 pm »
My opinion:  Require the user add a size field before the string in a TYPE.

TYPE foo
    X AS INTEGER
    Y AS INTEGER
    Name.size AS _BYTE (INTEGER, LONG, _INTEGER64)
    Name AS STRING
END TYPE

Without the variable string NAME.size, we'd toss an error message -- "Variable length string requires size definition in TYPEs".

The advantages of this are:
1) Minimal file size usage.  The user can set if 255 bytes would be enough for their STRING, or if they need integer/long/int64 size strings).
2) It's a visible reference to the data structure of the file.  If the process was automatic, it might lead to confusion if trying to port the file into another program.
3) It could allow the user to only send part of a string to the file.

Example 3:
MyType_String = "1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ"
MyType_String.size = 10 'only put a stub of the first 10 characters into the file
PUT #1, , MyType

By default, the QB64 would set the size to the data limit (255 for unsigned byte), or the size of the string (36 in this case), but it'd give the user a chance to set it lower, if desired.
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

FellippeHeitor

  • Guest
Re: Variable-length strings in TYPEs
« Reply #7 on: October 23, 2018, 01:58:14 pm »
If you require the user to inform the field size you still have a fixed-length string. Having to inform it for each record sounds extremely counterproductive.

I'm of the opinion that everything should be handled by QB64. Storing the record length immediately before the string is what sounds more viable.

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
Re: Variable-length strings in TYPEs
« Reply #8 on: October 23, 2018, 02:13:40 pm »
If you require the user to inform the field size you still have a fixed-length string. Having to inform it for each record sounds extremely counterproductive.

I'm of the opinion that everything should be handled by QB64. Storing the record length immediately before the string is what sounds more viable.

No matter what, you have a fixed length string; even if it's fixed to being the largest your memory can hold.  :P

If you decide QB64 should hold up to a LONG-size string, then that's a required 4-bytes to be wrote before each variable length string, which seems wasteful when you're only going to be asking for name fields (or such).  Being able to define with a previous field as BYTE, you only use a single extra byte with each string.  And being able to manually set the .size (if the programmer wants), doesn't change the nature of the variable length string, except to set a maximum length (smaller length strings would still record the smaller value).

It still puts the length before the string; it just adds a little more flexibility to define max length and minimize disk space required.
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline Cobalt

  • QB64 Developer
  • Forum Resident
  • Posts: 878
  • At 60 I become highly radioactive!
    • View Profile
Re: Variable-length strings in TYPEs
« Reply #9 on: October 23, 2018, 06:39:10 pm »
Just out of curiosity, just how does it(QB64) maintain the size of variable length strings normally? I mean how does a variable length string normally get stored and recalled?

Instead of worrying about storing the length could we use a string terminator character like tacking chr$(255) to the end of the string, so when the programmer wants to pull that value out and use it QB knows where the string stops? would that work when using GET and PUT with a TYPE too, as it would know that the string ends with &Hff and the next byte belongs to the next variable?
Granted after becoming radioactive I only have a half-life!

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
Re: Variable-length strings in TYPEs
« Reply #10 on: October 23, 2018, 07:20:23 pm »
Just out of curiosity, just how does it(QB64) maintain the size of variable length strings normally? I mean how does a variable length string normally get stored and recalled?

Instead of worrying about storing the length could we use a string terminator character like tacking chr$(255) to the end of the string, so when the programmer wants to pull that value out and use it QB knows where the string stops? would that work when using GET and PUT with a TYPE too, as it would know that the string ends with &Hff and the next byte belongs to the next variable?

The only issue with that is if the user has a string which already contains that character.  Then you'll end up terminating the string at that point.
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline TempodiBasic

  • Forum Resident
  • Posts: 1792
    • View Profile
Re: Variable-length strings in TYPEs
« Reply #11 on: October 23, 2018, 07:56:14 pm »
Hi
about variable string of unfixed lenght in UDT:
Yes I agree that if there was an EOS  (EndOfString) like EOF (EndOfFile) function the solution can be different.

Using a special mark like end of string is a procedural solution. The right issue noted by Steve can be minimized using a big mark as EOF (a sequence of 4-8 byte),so we change the case of premature end of string rare but not impossible.

Using a field_value to define lenght of string with accuracy (the solution of Fellippe to write at the end the size of string) or with range defined by type of variable declared  (the solution of Steve in UDT).

IMHO all these solutions need of an EOS function that QB64 in GET# e PUT# can manage internally.

Thanks to read
Programming isn't difficult, only it's  consuming time and coffee

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
Re: Variable-length strings in TYPEs
« Reply #12 on: October 23, 2018, 08:30:43 pm »
Using a field_value to define lenght of string with accuracy (the solution of Fellippe to write at the end the size of string)

You can't put it at the end of a string; you'd never know when the string stopped and the length field started.

For example:  "01234567890899"

Now, there's 2 strings in there, with the size written after.  What are they, reading sequentially from left to right?

We have no way to know...

Now, if we read right to left, we can decode it easily enough:
"012" 3-bytes
"456789089" 9-bytes

But, instead of reading right to left, why not stick to normal, sequential data and put the size first? 3 "012" 9 "456789089"
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline Cobalt

  • QB64 Developer
  • Forum Resident
  • Posts: 878
  • At 60 I become highly radioactive!
    • View Profile
Re: Variable-length strings in TYPEs
« Reply #13 on: October 23, 2018, 08:34:19 pm »
The only issue with that is if the user has a string which already contains that character.  Then you'll end up terminating the string at that point.

That's true, I guess I was thinking of a string only containing printable ASCII characters, &HFF is blank and normally that would be a space CHR$(32)[&H20], but if your storing numeric values in there as an array I guess 255 is possible.

then there is always just padding the beginning of the string by 4 bytes and having QB use this area as a length of string area. the programmer wouldn't need to worry about it. It does have the effect of making the smallest value possible for that string being 5 bytes
Granted after becoming radioactive I only have a half-life!

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
Re: Variable-length strings in TYPEs
« Reply #14 on: October 23, 2018, 08:59:39 pm »
And that's if you set a 2GB (or 4GB if you use an UNSIGNED LONG) limit to store that string length).  Since some folks (like soniaa in the Dim A thread) need to access larger sizes than that, an INTEGER64 would probably need to be used to stop any issues and work for future expansion, so you'd probably have to add 8 bytes per string...

***********************

Here's a different idea:

Append the index size after STRING for the type.

TYPE A
   X AS STRING SIZELIMIT BYTE
   Y AS STRING SIZELIMIT INTEGER
   Z AS STRING SIZELIMIT LONG
   AA AS STRING SIZELIMIT _INTEGER64
END TYPE
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!