Author Topic: A new take on arrays and data (Read 4496 times)

STxAxTIC · « **on:** April 05, 2020, 09:01:06 pm »

Hello all,

I hope you're sitting down... Grab a coffee too.

Motivation

So a good place to pick up this thought is from a thread Terry started, namely REDIM _PRESERVE bug at https://www.qb64.org/forum/index.php?topic=2416.0. Down in the replies, I wrote:

Quote

This is why I wish I started working on a really nice linked list library like a year ago. I have yet to develop this thought properly, but I suspect that linked lists can get us around this redim problem, and god willing, *all* problems surrounding arrays: mixing types and so on. I already use the technique on a case-by-case basis when I have a program that really demands a lot of its data - namely 3D levels and text editing to name my big two.

I got busy with this problem shortly after. Today I claim to have made good progress on creating a fast and universal library for storing and structuring data in QB64 programs. The approach doesn't attempt to solve the existing problems with REDIM _PRESERVE and its cousins. Instead, I erase that question entirely, along with many other questions that arise with it. The basic idea here is the linked list, which you can glimpse at Wikipedia: https://en.wikipedia.org/wiki/Linked_list. The prototype introduced here goes a bit beyond that though, and word from our local science community likens this more to a "graph" than a list. Whatever we call it in the end won't matter I guess.

Hard Array v.s. Soft Array

Let us define a "hard" array as a "traditional" array, as one would normally create in a program. While these are the best thing we have for structured and repetitious data, Terry points out that arrays, particularly of several dimensions, are clunky to reshape once they're carved out in memory. Nonetheless, the program must store its data somewhere, and this prototype is no exception. The bare-bones data in a program is still stored in "hard" arrays, strictly one-dimensional, as in:

Code: QB64: [Select]

REDIM SHARED IntegerData(0) AS INTEGER
REDIM SHARED StringData(0) AS STRING
REDIM SHARED DoubleData(0) AS DOUBLE

Now, if all a program needs is a simple one-dimensional array for a given data type, we're already done... But what if we want an array that mixes types, has an unequal number of rows per column, contains other arrays, and so on? This is solved by a new structure I call a Soft Array. A soft array is a collection of Elements that contain pointers to (i) data stored in hard arrays, and (ii) neighboring elements. Any given element has a unique identity, a variable indicating its data type, a variable containing an index in a hard array, and up to four connections with other elements:

Code: QB64: [Select]

TYPE Element
    Identity AS LONG '   Address
    Species AS STRING '  Data type
    Reference AS LONG '  Pointer to hard array index
    North AS LONG '
    South AS LONG '
    East AS LONG '
    West AS LONG '       (Orientation)
END TYPE

It turns out this is (more than) all one needs to conjure any complicated data structure one may imagine. Don't be deceived by the labels "North", "South", "East", "West" because the "landscape" available for element linkage is NOT a euclidean space. This is some kind of non-differentiable manifold... Maybe it's a "graph" after all, but I digress...

Data Assimilation

The "hard" storage arrays only contain unique data points. For example, to add a new integer into the data, the following function is executed internally:

Code: QB64: [Select]

FUNCTION NewIntegerData (x AS INTEGER)
    DIM TheReturn AS LONG
    DIM k AS LONG
    TheReturn = -1
    FOR k = 1 TO UBOUND(IntegerData)
        IF (IntegerData(k) = x) THEN
            TheReturn = k
            EXIT FOR
        END IF
    NEXT
    IF (TheReturn = -1) THEN
        REDIM _PRESERVE IntegerData(UBOUND(IntegerData) + 1)
        IntegerData(UBOUND(IntegerData)) = x
        TheReturn = UBOUND(IntegerData)
    END IF
    NewIntegerData = TheReturn
END SUB

With the above optimization, the one array that contains all (for instance) string data won't have multiple copies of the same field, such "Name".

Link Traversal

Navigation through a soft array is less obvious than through a hard array. To be definite, we must completely do away with the idea of a FOR loop that steps through the array indices, and replace that notion with following pointers to the child elements until all elements have been touched.

For example, the simple procedure

Code: QB64: [Select]

FOR i = 1 to UBOUND(MyArray)
    PRINT MyArray(i)
NEXT

is replaced by this recursive mess:

Code: QB64: [Select]

SUB PrintSoftArray (x AS LONG)
    DIM k AS LONG
    DIM t AS STRING
    t = ""
    IF (x <> -1) THEN
        k = SoftArray(x).FirstElement
        t = t + ListElementsRecur$(0, k)
    END IF
    t = LEFT$(t, LEN(t) - 1)
    PRINT t
END SUB
 
FUNCTION ListElementsRecur$ (i AS INTEGER, x AS LONG)
    DIM TheReturn AS STRING
    DIM s AS LONG
    DIM e AS LONG
    TheReturn = TheReturn + SPACE$(i) + Literal$(x) + CHR$(10)
    s = Elements(x).South
    e = Elements(x).East
    IF (e <> -1) THEN
        TheReturn = TheReturn + ListElementsRecur$(i + 2, e)
    END IF
    IF (s <> -1) THEN
        TheReturn = TheReturn + ListElementsRecur$(i, s)
    END IF
    ListElementsRecur$ = TheReturn
END SUB
 
FUNCTION Literal$ (x AS LONG)
    DIM TheReturn AS STRING
    TheReturn = ""
    SELECT CASE Elements(x).Species
        CASE "integer"
            TheReturn = LTRIM$(RTRIM$(STR$(IntegerData(Elements(x).Reference))))
        CASE "string"
            TheReturn = StringData(Elements(x).Reference)
        CASE "double"
            TheReturn = LTRIM$(RTRIM$(STR$(DoubleData(Elements(x).Reference))))
    END SELECT
    Literal$ = TheReturn
END FUNCTION

Conclusions (for now)

I could go on and on about what this thing can do. You've already seen this idea implemented in a few of my works - this is my attempt to lift the good ideas out and export them for other uses. If pictures speak to you as much as words, the screenshot attached attempts to draw out what I've explained.

Now, without further delay, the full code as of now:

Code: QB64: [Select]

See post marked as Best Answer.

Cobalt · « **Reply #1 on:** April 05, 2020, 09:47:09 pm »

I think I just shat my brains out my eyes......

are you attempting to push those 'linked lists' your always talking about?

Dimster · « **Reply #2 on:** April 06, 2020, 11:41:09 am »

Hi StxAxTIC - This is a little over my head so hopefully the answer to this question isn't something obvious but - if Steve or any of the others in the example data base should have multiple homes in various countries, can this array just expand the Location field, or would you need multiple new entries for Steve added to the end of the array, or is this array fixed from the start (ie need to anticipate the possibility of more Locations for each person at the formation of the array(s)?

STxAxTIC · « **Reply #3 on:** April 06, 2020, 06:36:35 pm »

Good question Dimster - so in this scheme you can insert anything, anywhere - even whole arrays within arrays. I play with this in a trivial way in the third example (my code is updated since first post), but for now those kind of moves look like this:

Code: QB64: [Select]

SUB DemoTreeEdit
    DIM a AS LONG
    DIM b AS LONG
    DIM c AS LONG
 
    a = NewSoftArray(0, "Tree Edit Test")
    a = LinkEast(a, NewStringElement("QB64 Buddy"))
    a = LinkEast(a, NewStringElement("Handle"))
    b = LinkEast(a, NewStringElement("flukiluke"))
    a = LinkSouth(a, NewStringElement("Name"))
    b = LinkEast(a, NewStringElement("Luke C."))
    a = LinkSouth(a, NewStringElement("Country"))
    b = LinkEast(a, NewStringElement("Australia"))
    c = LinkEast(b, NewStringElement("Locality"))
    b = LinkEast(c, NewStringElement("Down Under"))
    a = LinkSouth(a, NewStringElement("Birthyear"))
    b = LinkEast(a, NewIntegerElement(1523))
    c = LinkSouth(b, NewStringElement("May?"))
 
    ' Display and query tests
    CALL PrintSoftArray(ArrayId("Tree Edit Test"))
    PRINT
 
    PRINT "Inserting `Get it?' into list..."
    a = InsertEast(SeekString("Down Under", FromLabel("Tree Edit Test"), 1), NewStringElement("Get it?"))
    PRINT "Adding new entry to bottom of list..."
    a = InsertSouth(SeekString("QB64 Buddy", FromLabel("Tree Edit Test"), 1), NewStringElement("QB64 Enemy"))
    PRINT "Editing Birthyear..."
    a = EditIntegerReference(StepUsing(SeekString("Birthyear", FromLabel("Tree Edit Test"), 1), "e"), 1855)
    PRINT "Deleting a few entries under Country..."
    a = LinkEast(SeekString("Country", FromLabel("Tree Edit Test"), 1), SeekString("Down Under", FromLabel("Tree Edit Test"), 1))
    PRINT "Unlinking Name (and child elements)..."
    a = Unlink(SeekString("Name", FromLabel("Tree Edit Test"), 1))
 
    PRINT
    CALL PrintSoftArray(ArrayId("Tree Edit Test"))
END SUB

Ashish · « **Reply #4 on:** April 07, 2020, 08:19:08 am »

For those who having a bit trouble about linked lists (as I had :) ), here is a short video -

Ashish · « **Reply #5 on:** April 07, 2020, 08:23:58 am »

This is cool! I will experiment with this soon...

luke · « **Reply #6 on:** April 07, 2020, 09:00:05 am »

Off-the-cuff idea:

Instead of having explicit North/South/East/West, each element can have as many linked elements as you want by having a `Links AS STRING` then doing something like:

Code: [Select]

DEFLNG A-Z

set_link a.Links, 2, 10
set_link a.Links, 1, 12
PRINT get_link(a.Links, 2)
PRINT get_link(a.Links, 1)

'Set the nth link of s$ to l
SUB set_link (s$, n, l)
    p = n * LEN(l) + 1
    IF LEN(s$) < p THEN s$ = s$ + STRING$(p + LEN(l) - LEN(s$) - 1, MKL$(-1))
    MID$(s$, p) = MKL$(l)
END SUB

'Get the nth link in s$
FUNCTION get_link (s$, n)
    p = n * LEN(get_link) + 1
    IF LEN(s$) < p THEN get_link = -1 ELSE get_link = CVL(MID$(s$, p, LEN(get_link)))
END FUNCTION

STxAxTIC · « **Reply #7 on:** April 07, 2020, 10:10:29 am »

I like that idea Luke, definitely saving some potential for it. I had reason to choose the NSEW scheme based on symmetry and minimalism. Its turning out that I only need two fundamental directions to fake any number of array dimensions I want, hence the bias on South and East in any recursive tree traversal I do. The North and West directions are there for symmetry reasons basically - backward traversal. If I generalize the number of links per node to N, it will really be 2N to keep the symmetry. No big deal until I remember that I'm storing 2N+2 Long variables per element. The payoff is delayed in that case.

Anyway your idea is well received, it's been on my mind for down the road as needed.

Dimster · « **Reply #8 on:** April 07, 2020, 10:53:25 am »

STxAxTIC I'm getting an illegal string coversion error on this line.

Code: QB64: [Select]

     a = NewSoftArray(0, "Tree Edit Test")  

and on this code for Luke's

Code: QB64: [Select]

set_link a.Links, 2, 10

Saying it requires a string on that line.

Not sure exactly what I would need to do to over come those errors. Is it just a matter of adding some DATA lines for those arrays?

luke · « **Reply #9 on:** April 07, 2020, 11:21:21 am »

Quote from: Dimster on April 07, 2020, 10:53:25 am

and on this code for Luke's
Code: QB64: [Select]
set_link a.Links, 2, 10
Saying it requires a string on that line.

It was only a sketch of an idea; a.Links is a string as implied above.

STxAxTIC · « **Reply #10 on:** April 07, 2020, 11:42:22 am »

Dimster it looks like you tried to paste Luke's sample into the IDE next to my code or didnt copy the whole box. The box in the first post should run as is. (@ the bottom)

_vince · « **Reply #11 on:** April 08, 2020, 09:38:21 am »

I guess it is a quadruply linked list, another added direction of freedom to the doubly linked list. You could always generalize to a n-linked list, which might be what luke was getting at, but each k=1,...,n would be a dimension of direction, like a k-axis, so each k=1,...,n would correspond to two pointers, left/right or up/down or rather +k/-k.

I've never seen this one before but one structure that's not unheard of is a linked list of trees. Binary trees, heaps, etc are handy for fast lookup and shuffling of particularly sorted data. This gives you a coarse LL for lookup and a potentially more efficient tree for finer traversing.

Dimster · « **Reply #12 on:** April 08, 2020, 10:07:59 am »

I've been trying to visualize the structure - are we talking about multi 2 dimensional array, like lists of data with each array linked by a pointer, or is this a multi dimensional array, like a cube or group of cube like arrays, each connected by a pointer? If it's cube like, wouldn't that take massive memory to store data?

Cobalt · « **Reply #13 on:** April 08, 2020, 10:49:37 am »

How about you make a video tutorial on this, Bill?

STxAxTIC · « **Reply #14 on:** April 10, 2020, 01:00:17 am »

Hello Dimster

So the best way to visualize the structure, I was hoping, is in the PNG image I attached at the top. The only arrays in the whole program are just one-dimensional arrays for holding strings, ints, doubles, and so on.

The meat and potatoes of the thing is in a UDT called an "element". Elements contain NOTHING but (i) a unique address for the element, (ii) a pointer to the index on the array holding a string, int, or double, (iii) a "species" variable saying which array to point to, and (iv) up to four more pointers to head over to different elements. Stare at the picture and look at the example drawn out - it will click.

@Cobolt - we'll see about video. There are so many good videos on linked lists already, but if I feel that my documentation is lacking at the end of this, I'll do a video...

News:

Author Topic: A new take on arrays and data (Read 4496 times)

STxAxTIC

A new take on arrays and data

Cobalt

Re: A new take on arrays and data

Dimster

Re: A new take on arrays and data

STxAxTIC

Re: A new take on arrays and data

Ashish

Re: A new take on arrays and data

Ashish

Re: A new take on arrays and data

luke

Re: A new take on arrays and data

STxAxTIC

Re: A new take on arrays and data

Dimster

Re: A new take on arrays and data

luke

Re: A new take on arrays and data

STxAxTIC

Re: A new take on arrays and data

_vince

Re: A new take on arrays and data

Dimster

Re: A new take on arrays and data

Cobalt

Re: A new take on arrays and data

STxAxTIC

Re: A new take on arrays and data