QB64.org Forum

Active Forums => QB64 Discussion => Topic started by: luke on October 23, 2018, 06:47:31 am

Title: Variable-length strings in TYPEs
Post by: luke on October 23, 2018, 06:47:31 am
Q: How long is a piece of string?
A: As long as it needs to be.

Soon you'll be able to do something like this:
Code: QB64: [Select]
  1.     a AS INTEGER
  2.     b AS STRING
  3.  
  4. DIM a as t
  5. DIM b(20) AS t
  6.  
That is, you'll be able to have strings of variable length inside a TYPE, and have arrays of them and do all the usual stuff you'd expect with TYPE's. However, GET and PUT (as in to write to a binary file) don't natively make sense, because they rely on knowing how many bytes makes up the TYPE - now they are variable in size.So, should we:
 - Disallow using GET/PUT to a binary file with these variable length TYPE's?
 - Also write out length information that allows the data to be read back in correctly?
 - Something else?

I'm open to ideas.
Title: Re: Variable-length strings in TYPEs
Post by: RhoSigma on October 23, 2018, 08:29:07 am
How would be an approch like in the very old language BCPL, which put the String's length into the 1st string byte?

Of course just a byte would limit us to max. 255 chars + the 1st length byte = a 256 bytes memory chunk, which is a nice value because it's a base 2 power.

In QB64 we would need a LONG to store the length as of the variable string length limit in QB64, so our string would be like MKL$(length) + String.
However, this must be handled internally somehow, to not bother the user with the need to manually skip that 4-Bytes length marker.

If it's just for the overall length of the entire TYPE, to be able to use PUT/GET on it, an IFF file appoch would be my choice: https://en.wikipedia.org/wiki/Interchange_File_Format
Title: Re: Variable-length strings in TYPEs
Post by: Petr on October 23, 2018, 10:16:44 am
Hi.

Remain this format "STRING * count" remain in binary files? After editing, will this work?  B$ = SPACE $(1000): GET file ,, B$?
Will it be compatible with _MEM at all? (will I be able to load such type of field by this function)?
Will LEN after call LEN (Array.String) return this string lenght?
If you disable STRING * value for writing to file with PUT or reading with GET , this is not good.
I would prefer it to be possible to use both entries in TYPE. Both firmly defined and this new one.
Title: Re: Variable-length strings in TYPEs
Post by: FellippeHeitor on October 23, 2018, 10:38:04 am
New functionality won't replace existing functionality. It's an alternative use.
Title: Re: Variable-length strings in TYPEs
Post by: Petr on October 23, 2018, 10:56:43 am
Thank you Fellippe for clarification. I wish you success in working on this difficult matter.
Title: Re: Variable-length strings in TYPEs
Post by: FellippeHeitor on October 23, 2018, 11:12:17 am
This is Luke's work, let's all thank him (and be thankful that uni life has its breaks!)
Title: Re: Variable-length strings in TYPEs
Post by: SMcNeill on October 23, 2018, 12:21:07 pm
My opinion:  Require the user add a size field before the string in a TYPE.

TYPE foo
    X AS INTEGER
    Y AS INTEGER
    Name.size AS _BYTE (INTEGER, LONG, _INTEGER64)
    Name AS STRING
END TYPE

Without the variable string NAME.size, we'd toss an error message -- "Variable length string requires size definition in TYPEs".

The advantages of this are:
1) Minimal file size usage.  The user can set if 255 bytes would be enough for their STRING, or if they need integer/long/int64 size strings).
2) It's a visible reference to the data structure of the file.  If the process was automatic, it might lead to confusion if trying to port the file into another program.
3) It could allow the user to only send part of a string to the file.

Example 3:
MyType_String = "1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ"
MyType_String.size = 10 'only put a stub of the first 10 characters into the file
PUT #1, , MyType

By default, the QB64 would set the size to the data limit (255 for unsigned byte), or the size of the string (36 in this case), but it'd give the user a chance to set it lower, if desired.
Title: Re: Variable-length strings in TYPEs
Post by: FellippeHeitor on October 23, 2018, 01:58:14 pm
If you require the user to inform the field size you still have a fixed-length string. Having to inform it for each record sounds extremely counterproductive.

I'm of the opinion that everything should be handled by QB64. Storing the record length immediately before the string is what sounds more viable.
Title: Re: Variable-length strings in TYPEs
Post by: SMcNeill on October 23, 2018, 02:13:40 pm
If you require the user to inform the field size you still have a fixed-length string. Having to inform it for each record sounds extremely counterproductive.

I'm of the opinion that everything should be handled by QB64. Storing the record length immediately before the string is what sounds more viable.

No matter what, you have a fixed length string; even if it's fixed to being the largest your memory can hold.  :P

If you decide QB64 should hold up to a LONG-size string, then that's a required 4-bytes to be wrote before each variable length string, which seems wasteful when you're only going to be asking for name fields (or such).  Being able to define with a previous field as BYTE, you only use a single extra byte with each string.  And being able to manually set the .size (if the programmer wants), doesn't change the nature of the variable length string, except to set a maximum length (smaller length strings would still record the smaller value).

It still puts the length before the string; it just adds a little more flexibility to define max length and minimize disk space required.
Title: Re: Variable-length strings in TYPEs
Post by: Cobalt on October 23, 2018, 06:39:10 pm
Just out of curiosity, just how does it(QB64) maintain the size of variable length strings normally? I mean how does a variable length string normally get stored and recalled?

Instead of worrying about storing the length could we use a string terminator character like tacking chr$(255) to the end of the string, so when the programmer wants to pull that value out and use it QB knows where the string stops? would that work when using GET and PUT with a TYPE too, as it would know that the string ends with &Hff and the next byte belongs to the next variable?
Title: Re: Variable-length strings in TYPEs
Post by: SMcNeill on October 23, 2018, 07:20:23 pm
Just out of curiosity, just how does it(QB64) maintain the size of variable length strings normally? I mean how does a variable length string normally get stored and recalled?

Instead of worrying about storing the length could we use a string terminator character like tacking chr$(255) to the end of the string, so when the programmer wants to pull that value out and use it QB knows where the string stops? would that work when using GET and PUT with a TYPE too, as it would know that the string ends with &Hff and the next byte belongs to the next variable?

The only issue with that is if the user has a string which already contains that character.  Then you'll end up terminating the string at that point.
Title: Re: Variable-length strings in TYPEs
Post by: TempodiBasic on October 23, 2018, 07:56:14 pm
Hi
about variable string of unfixed lenght in UDT:
Yes I agree that if there was an EOS  (EndOfString) like EOF (EndOfFile) function the solution can be different.

Using a special mark like end of string is a procedural solution. The right issue noted by Steve can be minimized using a big mark as EOF (a sequence of 4-8 byte),so we change the case of premature end of string rare but not impossible.

Using a field_value to define lenght of string with accuracy (the solution of Fellippe to write at the end the size of string) or with range defined by type of variable declared  (the solution of Steve in UDT).

IMHO all these solutions need of an EOS function that QB64 in GET# e PUT# can manage internally.

Thanks to read
Title: Re: Variable-length strings in TYPEs
Post by: SMcNeill on October 23, 2018, 08:30:43 pm
Using a field_value to define lenght of string with accuracy (the solution of Fellippe to write at the end the size of string)

You can't put it at the end of a string; you'd never know when the string stopped and the length field started.

For example:  "01234567890899"

Now, there's 2 strings in there, with the size written after.  What are they, reading sequentially from left to right?

We have no way to know...

Now, if we read right to left, we can decode it easily enough:
"012" 3-bytes
"456789089" 9-bytes

But, instead of reading right to left, why not stick to normal, sequential data and put the size first? 3 "012" 9 "456789089"
Title: Re: Variable-length strings in TYPEs
Post by: Cobalt on October 23, 2018, 08:34:19 pm
The only issue with that is if the user has a string which already contains that character.  Then you'll end up terminating the string at that point.

That's true, I guess I was thinking of a string only containing printable ASCII characters, &HFF is blank and normally that would be a space CHR$(32)[&H20], but if your storing numeric values in there as an array I guess 255 is possible.

then there is always just padding the beginning of the string by 4 bytes and having QB use this area as a length of string area. the programmer wouldn't need to worry about it. It does have the effect of making the smallest value possible for that string being 5 bytes
Title: Re: Variable-length strings in TYPEs
Post by: SMcNeill on October 23, 2018, 08:59:39 pm
And that's if you set a 2GB (or 4GB if you use an UNSIGNED LONG) limit to store that string length).  Since some folks (like soniaa in the Dim A thread) need to access larger sizes than that, an INTEGER64 would probably need to be used to stop any issues and work for future expansion, so you'd probably have to add 8 bytes per string...

***********************

Here's a different idea:

Append the index size after STRING for the type.

TYPE A
   X AS STRING SIZELIMIT BYTE
   Y AS STRING SIZELIMIT INTEGER
   Z AS STRING SIZELIMIT LONG
   AA AS STRING SIZELIMIT _INTEGER64
END TYPE
Title: Re: Variable-length strings in TYPEs
Post by: Cobalt on October 23, 2018, 10:36:58 pm
And that's if you set a 2GB (or 4GB if you use an UNSIGNED LONG) limit to store that string length).  Since some folks (like soniaa in the Dim A thread) need to access larger sizes than that, an INTEGER64 would probably need to be used to stop any issues and work for future expansion, so you'd probably have to add 8 bytes per string...


Even the longest book ever written comes in at under 10 million(estimated) ASCII characters. And you have to consider memory size(the DIM A thread) in there as well. And data structure, so a limit of UNSIGNED LONG(cause you can't have a negative string anyway) is more than enough in any practical application. and even at that you could only have 4 or 5 arrayed of that type on most typical computers. While I agree that special occasions may call for a much larger value, most users of QB64 aren't going to have machines that could handle it. And large corporations like Apple or Google or the like that would have the means to have machines with terabyte and petabyte ram values will probably have efficient data structures or custom software anyway, you know this. So just for simplicity my vote is on keeping it small, either a string terminator(0A0D like most text editors maybe) or pre-string LOS(Length Of String) integer.

Don't get me wrong I see where you are coming from I just don't see the practical simplistic side of it.
Title: Re: Variable-length strings in TYPEs
Post by: codeguy on October 24, 2018, 12:38:58 pm
Here's an idea.
A fixed delimiter of any size (in this case a*), namely sep$ for arbitrary lengths.
Code: QB64: [Select]
  1. TYPE Identity
  2.     FirstName AS STRING * 256
  3.     Lastname AS STRING * 32
  4.     NamesOfPets AS STRING '* does not work without String * nnnn
  5. DIM qx AS Identity
  6. '                  1      2         3   4              5    6             7        8              9                                       X
  7. Del$ = "*a"
  8. qx.FirstName = del$+"thea*bomina*bleSnowMa*nWa*sHereVistingSa*squa*tchThisIsARea*llyLongNa*meAndNobodyWa*ntsToWriteThisOutCompletelyAMillionSqua*redTimes"
  9. NthOneToGet& = 7 '7 = llyLongN using del$ as delimiter
  10.  
  11. PRINT GrabThatNth$(qx.FirstName, NthOneToGet&)
  12. FUNCTION GrabThatNth$ (m$, n&)
  13.     For I = 1 to Len(m$)
  14.            If asc(m$,I)=0 then
  15.                Exit for
  16.            End if
  17.     Next
  18.     Sep$=left$(m$,I)
  19.     p& = Len(sep$)+1
  20.     c& = 0
  21.     DO
  22.         m& = INSTR(p&, m$, sep$)
  23.         IF m& < 1 THEN
  24.             GrabThatNth$ = MID$(m$, p&)
  25.             EXIT DO
  26.         ELSEIF c& < n& THEN
  27.             p& = m& + sepLen&
  28.             c& = c& + 1
  29.         ELSE
  30.             GrabThatNth$ = MID$(m$, p&, m& - p&)
  31.             EXIT DO
  32.         END IF
  33.     LOOP
  34.  
Title: Re: Variable-length strings in TYPEs
Post by: FellippeHeitor on October 24, 2018, 01:14:38 pm
Say what??
Title: Re: Variable-length strings in TYPEs
Post by: SMcNeill on October 24, 2018, 01:23:17 pm
Say what??

No idea.  I didn't comprehend what Codeguy was referring to either. 
Title: Re: Variable-length strings in TYPEs
Post by: codeguy on October 24, 2018, 02:11:04 pm
The approach I use is for the string using fixed-length delimiters to find the Nth substring, no matter how far away the next delimeter is. The delimiter could be any length so as not to conflict with Individual strings already contained. Maybe a random string prepended with character 0 appended to that eg:
RsYv109!+chr$(0)+some stuff betweendelimeter+RsYv109!+chr$(0)+thenextString+RsYv109!+chr$(0)...
You get the idea. The lengths of substrings can then be limited only by memory. Hope that helps.
Title: Re: Variable-length strings in TYPEs
Post by: TempodiBasic on October 24, 2018, 06:16:01 pm
Quote
Using a special mark like end of string is a procedural solution. The right issue noted by Steve can be minimized using a big mark as EOF (a sequence of 4-8 byte),so we change the case of premature end of string rare but not impossible.

so IMHO Codeguy suggests to use this mark EndOfString to be able to distinguish among the single string (substring), it is a way if we have a UDT made only with strings of variable length .... like
Code: QB64: [Select]
  1.  FirstString AS STRING
  2.  SecondString AS STRING
  3.  ThirdString AS STRING
  4.  
  5. ' using a pre_fixed_length delimiter QB64 can package all this Strings in a one SuperString to PUT# and/or GET#
  6. ' internally QB64 makes these operations:
  7.  
  8. DIM SuperString AS STRING ,ContStringAS INTEGER
  9. ContString = 3
  10. SuperString = FirstString + Delimiter$+ SecondString + Delimiter$+ ThirdString+ Delimiter$
  11. PUT/GET#1, SuperString
  12.  
  13. FOR cont% = 1 TO ContString
  14. IF  INSTR(Delimt%+1,SuperString, Delimiter$) THEN
  15. ' if there is a Delimiter$ at its left there is a string
  16.  IF cont%= 1 THEN FirstString = ExtractString SuperString, Delimiter$
  17.  IF cont%= 2 THEN SecondString = ExtractString SuperString, Delimiter$
  18.  IF cont%= 3 THEN ThirdString = ExtractString SuperString, Delimiter$
  19.  
  20. FUNCTION ExtractString (SupStr$, Delim$)
  21. Delimt%= INSTR(Delimt%+1,SupStr$, Delim$) ' it finds the end of a string
  22.  ExtractString =MID$ (SupStr$,1,Delimt%)  'it returns the string at left of SupStr$
  23. SupStr$ = RIGHT$(SupStr$,LEN(SupStr$)-( Delimt% + LEN (Delim$))) ' it returns the rest of SupStr$ without the last extracted string
  24.  

IMHO this case is like the array of string
Quote

TYPE
 FirstString AS STRING
 SecondString AS STRING
 ThirdString AS STRING
END TYPE

is equal to write

Quote
CONST FirstString = 1, SecondString =2, ThirdString = 3
DIM ArrayString (FirstString TO ThirdString) AS STRING

IMHO QB64 can manage PUT#/GET# with ArrayString.
So it is possible that QB64 internally can translate an UDT of strings variable to an array of Strings

--------
On the other hand, but how do Qbasic/QB4.5/QB7.1  create/manage strings in RAM?
Title: Re: Variable-length strings in TYPEs
Post by: codeguy on October 24, 2018, 09:59:07 pm
Something along the lines of what's pictured. I will upload a doable and faster version with substring splits loaded to an array for FAR easier and faster string lookup. It may have to involve HashTables and multidimensional arrays to be more usable for very large strings and with many substrings. It would greatly reduce the horsepower a CPU needs to extract the 999,368th substring for example.
Title: Re: Variable-length strings in TYPEs
Post by: bplus on October 24, 2018, 10:38:13 pm
Either that or linked lists using _get and _put yet to be created that use linked lists or something else... ;)
Title: Re: Variable-length strings in TYPEs
Post by: pinology on October 26, 2018, 08:33:07 am
kind of along the same lines I was wondering if you could use an array in a type variable something like
type a
  b(20) as integer
  c as string * 5
end type
Title: Re: Variable-length strings in TYPEs
Post by: luke on October 26, 2018, 09:06:36 am
kind of along the same lines I was wondering if you could use an array in a type variable something like
type a
  b(20) as integer
  c as string * 5
end type
That's on the cards as the next thing to work on
Title: Re: Variable-length strings in TYPEs
Post by: pinology on October 26, 2018, 09:15:04 pm
cool luke, that would come in real handy for putting data into random files and being able to access some of your variables numerically
Title: Re: Variable-length strings in TYPEs
Post by: TerryRitchie on October 27, 2018, 03:02:59 am
kind of along the same lines I was wondering if you could use an array in a type variable something like
type a
  b(20) as integer
  c as string * 5
end type
That's on the cards as the next thing to work on

Oh man, that would make writing this updated sprite library so much easier, LOL.
Title: Re: Variable-length strings in TYPEs
Post by: luke on October 27, 2018, 09:22:04 am
Making GET and PUT work in any fashion turned out to be much harder and more work than I expected, since I need to find a way to embed UDT structural info into the final executable. Until I do that, I have published the changes allowing for variable length strings in UDT's to the development build of QB64. I would appreciate any attempts to find bugs with it (bugs are likely to be crashes, garbage data in variables, compiler errors when there shouldn't be).
Title: Re: Variable-length strings in TYPEs
Post by: FellippeHeitor on October 27, 2018, 12:15:56 pm
Good news!
Title: Re: Variable-length strings in TYPEs
Post by: Cobalt on October 27, 2018, 10:59:06 pm
Hey, Luke, LEN() always returns 4 regardless how many characters are in the string.
Title: Re: Variable-length strings in TYPEs
Post by: bplus on October 28, 2018, 01:11:07 pm
Hey with variable length strings now allowed in TYPEs we can make fake arrays with strings! :-))

Of course we could do that already with regular strings.