Author Topic: A new take on arrays and data  (Read 4281 times)

0 Members and 1 Guest are viewing this topic.

Offline STxAxTIC

  • Library Staff
  • Forum Resident
  • Posts: 1091
  • he lives
    • View Profile
A new take on arrays and data
« on: April 05, 2020, 09:01:06 pm »
Hello all,

I hope you're sitting down... Grab a coffee too.

Motivation

So a good place to pick up this thought is from a thread Terry started, namely REDIM _PRESERVE bug at https://www.qb64.org/forum/index.php?topic=2416.0. Down in the replies, I wrote:
Quote
This is why I wish I started working on a really nice linked list library like a year ago. I have yet to develop this thought properly, but I suspect that linked lists can get us around this redim problem, and god willing, *all* problems surrounding arrays: mixing types and so on. I already use the technique on a case-by-case basis when I have a program that really demands a lot of its data - namely 3D levels and text editing to name my big two.

I got busy with this problem shortly after. Today I claim to have made good progress on creating a fast and universal library for storing and structuring data in QB64 programs. The approach doesn't attempt to solve the existing problems with REDIM _PRESERVE and its cousins. Instead, I erase that question entirely, along with many other questions that arise with it. The basic idea here is the linked list, which you can glimpse at Wikipedia: https://en.wikipedia.org/wiki/Linked_list. The prototype introduced here goes a bit beyond that though, and word from our local science community likens this more to a "graph" than a list. Whatever we call it in the end won't matter I guess.

Hard Array v.s. Soft Array

Let us define a "hard" array as a "traditional" array, as one would normally create in a program. While these are the best thing we have for structured and repetitious data, Terry points out that arrays, particularly of several dimensions, are clunky to reshape once they're carved out in memory. Nonetheless, the program must store its data somewhere, and this prototype is no exception. The bare-bones data in a program is still stored in "hard" arrays, strictly one-dimensional, as in:

Code: QB64: [Select]
  1. REDIM SHARED IntegerData(0) AS INTEGER
  2. REDIM SHARED StringData(0) AS STRING
  3. REDIM SHARED DoubleData(0) AS DOUBLE

Now, if all a program needs is a simple one-dimensional array for a given data type, we're already done... But what if we want an array that mixes types, has an unequal number of rows per column, contains other arrays, and so on? This is solved by a new structure I call a Soft Array. A soft array is a collection of Elements that contain pointers to (i) data stored in hard arrays, and (ii) neighboring elements. Any given element has a unique identity, a variable indicating its data type, a variable containing an index in a hard array, and up to four connections with other elements:

Code: QB64: [Select]
  1. TYPE Element
  2.     Identity AS LONG '   Address
  3.     Species AS STRING '  Data type
  4.     Reference AS LONG '  Pointer to hard array index
  5.     North AS LONG '
  6.     South AS LONG '
  7.     East AS LONG '
  8.     West AS LONG '       (Orientation)

It turns out this is (more than) all one needs to conjure any complicated data structure one may imagine. Don't be deceived by the labels "North", "South", "East", "West" because the "landscape" available for element linkage is NOT a euclidean space. This is some kind of non-differentiable manifold... Maybe it's a "graph" after all, but I digress...

Data Assimilation

The "hard" storage arrays only contain unique data points. For example, to add a new integer into the data, the following function is executed internally:

Code: QB64: [Select]
  1. FUNCTION NewIntegerData (x AS INTEGER)
  2.     DIM TheReturn AS LONG
  3.     DIM k AS LONG
  4.     TheReturn = -1
  5.     FOR k = 1 TO UBOUND(IntegerData)
  6.         IF (IntegerData(k) = x) THEN
  7.             TheReturn = k
  8.             EXIT FOR
  9.         END IF
  10.     NEXT
  11.     IF (TheReturn = -1) THEN
  12.         REDIM _PRESERVE IntegerData(UBOUND(IntegerData) + 1)
  13.         IntegerData(UBOUND(IntegerData)) = x
  14.         TheReturn = UBOUND(IntegerData)
  15.     END IF
  16.     NewIntegerData = TheReturn

With the above optimization, the one array that contains all (for instance) string data won't have multiple copies of the same field, such "Name".

Link Traversal

Navigation through a soft array is less obvious than through a hard array. To be definite, we must completely do away with the idea of a FOR loop that steps through the array indices, and replace that notion with following pointers to the child elements until all elements have been touched.

For example, the simple procedure

Code: QB64: [Select]
  1. FOR i = 1 to UBOUND(MyArray)
  2.     PRINT MyArray(i)

is replaced by this recursive mess:

Code: QB64: [Select]
  1. SUB PrintSoftArray (x AS LONG)
  2.     DIM k AS LONG
  3.     DIM t AS STRING
  4.     t = ""
  5.     IF (x <> -1) THEN
  6.         k = SoftArray(x).FirstElement
  7.         t = t + ListElementsRecur$(0, k)
  8.     END IF
  9.     t = LEFT$(t, LEN(t) - 1)
  10.     PRINT t
  11.  
  12. FUNCTION ListElementsRecur$ (i AS INTEGER, x AS LONG)
  13.     DIM TheReturn AS STRING
  14.     DIM s AS LONG
  15.     DIM e AS LONG
  16.     TheReturn = TheReturn + SPACE$(i) + Literal$(x) + CHR$(10)
  17.     s = Elements(x).South
  18.     e = Elements(x).East
  19.     IF (e <> -1) THEN
  20.         TheReturn = TheReturn + ListElementsRecur$(i + 2, e)
  21.     END IF
  22.     IF (s <> -1) THEN
  23.         TheReturn = TheReturn + ListElementsRecur$(i, s)
  24.     END IF
  25.     ListElementsRecur$ = TheReturn
  26.  
  27. FUNCTION Literal$ (x AS LONG)
  28.     DIM TheReturn AS STRING
  29.     TheReturn = ""
  30.     SELECT CASE Elements(x).Species
  31.         CASE "integer"
  32.             TheReturn = LTRIM$(RTRIM$(STR$(IntegerData(Elements(x).Reference))))
  33.         CASE "string"
  34.             TheReturn = StringData(Elements(x).Reference)
  35.         CASE "double"
  36.             TheReturn = LTRIM$(RTRIM$(STR$(DoubleData(Elements(x).Reference))))
  37.     END SELECT
  38.     Literal$ = TheReturn

Conclusions (for now)

I could go on and on about what this thing can do. You've already seen this idea implemented in a few of my works - this is my attempt to lift the good ideas out and export them for other uses. If pictures speak to you as much as words, the screenshot attached attempts to draw out what I've explained.

Now, without further delay, the full code as of now:

Code: QB64: [Select]
  1. See post marked as Best Answer.
links.png
* links.png (Filesize: 41.2 KB, Dimensions: 800x600, Views: 325)
« Last Edit: April 10, 2020, 01:01:53 am by STxAxTIC »
You're not done when it works, you're done when it's right.

Offline Cobalt

  • QB64 Developer
  • Forum Resident
  • Posts: 878
  • At 60 I become highly radioactive!
    • View Profile
Re: A new take on arrays and data
« Reply #1 on: April 05, 2020, 09:47:09 pm »
I think I just shat my brains out my eyes......

are you attempting to push those 'linked lists' your always talking about?
Granted after becoming radioactive I only have a half-life!

Offline Dimster

  • Forum Resident
  • Posts: 500
    • View Profile
Re: A new take on arrays and data
« Reply #2 on: April 06, 2020, 11:41:09 am »
Hi StxAxTIC - This is a little over my head so hopefully the answer to this question isn't something obvious but - if Steve or any of the others in the example data base should have multiple homes in various countries, can this array just expand the Location field, or would you need multiple new entries for Steve added to the end of the array, or is this array fixed from the start (ie need to anticipate the possibility of more Locations for each person at the formation of the array(s)?

Offline STxAxTIC

  • Library Staff
  • Forum Resident
  • Posts: 1091
  • he lives
    • View Profile
Re: A new take on arrays and data
« Reply #3 on: April 06, 2020, 06:36:35 pm »
Good question Dimster - so in this scheme you can insert anything, anywhere - even whole arrays within arrays. I play with this in a trivial way in the third example (my code is updated since first post), but for now those kind of moves look like this:

Code: QB64: [Select]
  1. SUB DemoTreeEdit
  2.     DIM a AS LONG
  3.     DIM b AS LONG
  4.     DIM c AS LONG
  5.  
  6.     a = NewSoftArray(0, "Tree Edit Test")
  7.     a = LinkEast(a, NewStringElement("QB64 Buddy"))
  8.     a = LinkEast(a, NewStringElement("Handle"))
  9.     b = LinkEast(a, NewStringElement("flukiluke"))
  10.     a = LinkSouth(a, NewStringElement("Name"))
  11.     b = LinkEast(a, NewStringElement("Luke C."))
  12.     a = LinkSouth(a, NewStringElement("Country"))
  13.     b = LinkEast(a, NewStringElement("Australia"))
  14.     c = LinkEast(b, NewStringElement("Locality"))
  15.     b = LinkEast(c, NewStringElement("Down Under"))
  16.     a = LinkSouth(a, NewStringElement("Birthyear"))
  17.     b = LinkEast(a, NewIntegerElement(1523))
  18.     c = LinkSouth(b, NewStringElement("May?"))
  19.  
  20.     ' Display and query tests
  21.     CALL PrintSoftArray(ArrayId("Tree Edit Test"))
  22.     PRINT
  23.  
  24.     PRINT "Inserting `Get it?' into list..."
  25.     a = InsertEast(SeekString("Down Under", FromLabel("Tree Edit Test"), 1), NewStringElement("Get it?"))
  26.     PRINT "Adding new entry to bottom of list..."
  27.     a = InsertSouth(SeekString("QB64 Buddy", FromLabel("Tree Edit Test"), 1), NewStringElement("QB64 Enemy"))
  28.     PRINT "Editing Birthyear..."
  29.     a = EditIntegerReference(StepUsing(SeekString("Birthyear", FromLabel("Tree Edit Test"), 1), "e"), 1855)
  30.     PRINT "Deleting a few entries under Country..."
  31.     a = LinkEast(SeekString("Country", FromLabel("Tree Edit Test"), 1), SeekString("Down Under", FromLabel("Tree Edit Test"), 1))
  32.     PRINT "Unlinking Name (and child elements)..."
  33.     a = Unlink(SeekString("Name", FromLabel("Tree Edit Test"), 1))
  34.  
  35.     PRINT
  36.     CALL PrintSoftArray(ArrayId("Tree Edit Test"))
ss.png
* ss.png (Filesize: 10.95 KB, Dimensions: 642x625, Views: 268)
You're not done when it works, you're done when it's right.

Offline Ashish

  • Forum Resident
  • Posts: 630
  • Never Give Up!
    • View Profile
Re: A new take on arrays and data
« Reply #4 on: April 07, 2020, 08:19:08 am »
For those who having a bit trouble about linked lists (as I had :) ), here is a short video -


if (Me.success) {Me.improve()} else {Me.tryAgain()}


My Projects - https://github.com/AshishKingdom?tab=repositories
OpenGL tutorials - https://ashishkingdom.github.io/OpenGL-Tutorials

Offline Ashish

  • Forum Resident
  • Posts: 630
  • Never Give Up!
    • View Profile
Re: A new take on arrays and data
« Reply #5 on: April 07, 2020, 08:23:58 am »
This is cool! I will experiment with this soon...
if (Me.success) {Me.improve()} else {Me.tryAgain()}


My Projects - https://github.com/AshishKingdom?tab=repositories
OpenGL tutorials - https://ashishkingdom.github.io/OpenGL-Tutorials

Offline luke

  • Administrator
  • Seasoned Forum Regular
  • Posts: 324
    • View Profile
Re: A new take on arrays and data
« Reply #6 on: April 07, 2020, 09:00:05 am »
Off-the-cuff idea:

Instead of having explicit North/South/East/West, each element can have as many linked elements as you want by having a `Links AS STRING` then doing something like:
Code: [Select]
DEFLNG A-Z

set_link a.Links, 2, 10
set_link a.Links, 1, 12
PRINT get_link(a.Links, 2)
PRINT get_link(a.Links, 1)

'Set the nth link of s$ to l
SUB set_link (s$, n, l)
    p = n * LEN(l) + 1
    IF LEN(s$) < p THEN s$ = s$ + STRING$(p + LEN(l) - LEN(s$) - 1, MKL$(-1))
    MID$(s$, p) = MKL$(l)
END SUB

'Get the nth link in s$
FUNCTION get_link (s$, n)
    p = n * LEN(get_link) + 1
    IF LEN(s$) < p THEN get_link = -1 ELSE get_link = CVL(MID$(s$, p, LEN(get_link)))
END FUNCTION

Offline STxAxTIC

  • Library Staff
  • Forum Resident
  • Posts: 1091
  • he lives
    • View Profile
Re: A new take on arrays and data
« Reply #7 on: April 07, 2020, 10:10:29 am »
I like that idea Luke, definitely saving some potential for it. I had reason to choose the NSEW scheme based on symmetry and minimalism. Its turning out that I only need two fundamental directions to fake any number of array dimensions I want, hence the bias on South and East in any recursive tree traversal I do. The North and West directions are there for symmetry reasons basically - backward traversal. If I generalize the number of links per node to N, it will really be 2N to keep the symmetry. No big deal until I remember that I'm storing 2N+2 Long variables per element. The payoff is delayed in that case.

Anyway your idea is well received, it's been on my mind for down the road as needed.
« Last Edit: April 07, 2020, 10:32:06 am by STxAxTIC »
You're not done when it works, you're done when it's right.

Offline Dimster

  • Forum Resident
  • Posts: 500
    • View Profile
Re: A new take on arrays and data
« Reply #8 on: April 07, 2020, 10:53:25 am »
STxAxTIC I'm getting an illegal string coversion error on this line.
Code: QB64: [Select]
  1.      a = NewSoftArray(0, "Tree Edit Test")  
and on this code for Luke's
Code: QB64: [Select]
  1. set_link a.Links, 2, 10
Saying it requires a string on that line.

Not sure exactly what I would need to do to over come those errors. Is it just a matter of adding some DATA lines for those arrays?

Offline luke

  • Administrator
  • Seasoned Forum Regular
  • Posts: 324
    • View Profile
Re: A new take on arrays and data
« Reply #9 on: April 07, 2020, 11:21:21 am »
and on this code for Luke's
Code: QB64: [Select]
  1. set_link a.Links, 2, 10
Saying it requires a string on that line.
It was only a sketch of an idea; a.Links is a string as implied above.

Offline STxAxTIC

  • Library Staff
  • Forum Resident
  • Posts: 1091
  • he lives
    • View Profile
Re: A new take on arrays and data
« Reply #10 on: April 07, 2020, 11:42:22 am »
Dimster it looks like you tried to paste Luke's sample into the IDE next to my code or didnt copy the whole box. The box in the first post should run as is. (@ the bottom)
« Last Edit: April 07, 2020, 12:34:24 pm by STxAxTIC »
You're not done when it works, you're done when it's right.

Offline _vince

  • Seasoned Forum Regular
  • Posts: 422
    • View Profile
Re: A new take on arrays and data
« Reply #11 on: April 08, 2020, 09:38:21 am »
I guess it is a quadruply linked list, another added direction of freedom to the doubly linked list. You could always generalize to a n-linked list, which might be what luke was getting at, but each k=1,...,n would be a dimension of direction, like a k-axis, so each k=1,...,n would correspond to two pointers, left/right or up/down or rather +k/-k.

I've never seen this one before but one structure that's not unheard of is a linked list of trees. Binary trees, heaps, etc are handy for fast lookup and shuffling of particularly sorted data. This gives you a coarse LL for lookup and a potentially more efficient tree for finer traversing. 

Offline Dimster

  • Forum Resident
  • Posts: 500
    • View Profile
Re: A new take on arrays and data
« Reply #12 on: April 08, 2020, 10:07:59 am »
I've been trying to visualize the structure - are we talking about multi 2 dimensional array, like lists of data with each array linked by a pointer, or is this a multi dimensional array, like a cube or group of cube like arrays, each connected by a pointer? If it's cube like, wouldn't that take massive memory to store data?

Offline Cobalt

  • QB64 Developer
  • Forum Resident
  • Posts: 878
  • At 60 I become highly radioactive!
    • View Profile
Re: A new take on arrays and data
« Reply #13 on: April 08, 2020, 10:49:37 am »
How about you make a video tutorial on this, Bill?
Granted after becoming radioactive I only have a half-life!

Offline STxAxTIC

  • Library Staff
  • Forum Resident
  • Posts: 1091
  • he lives
    • View Profile
Re: A new take on arrays and data
« Reply #14 on: April 10, 2020, 01:00:17 am »
Hello Dimster

So the best way to visualize the structure, I was hoping, is in the PNG image I attached at the top. The only arrays in the whole program are just one-dimensional arrays for holding strings, ints, doubles, and so on.

The meat and potatoes of the thing is in a UDT called an "element". Elements contain NOTHING but (i) a unique address for the element, (ii) a pointer to the index on the array holding a string, int, or double, (iii) a "species" variable saying which array to point to, and (iv) up to four more pointers to head over to different elements. Stare at the picture and look at the example drawn out - it will click.

@Cobolt - we'll see about video. There are so many good videos on linked lists already, but if I feel that my documentation is lacking at the end of this, I'll do a video...
You're not done when it works, you're done when it's right.