Author Topic: Variable length string database, using an index file (Read 6708 times)

SMcNeill · « **on:** October 02, 2020, 12:19:40 pm »

Code: QB64: [Select]

'Random length string database creation.
'This demo will utilize two different files to manage our database.
'the first one will be the data, and the second will be our index to the data
 
TYPE RecordType
    Name AS STRING
    Age AS _BYTE
    Sex AS STRING
    Phone AS STRING
END TYPE
 
TYPE IndexType
    StartPosition AS LONG
    LengthName AS LONG 'track how long the name is
    LengthSex AS LONG 'track how long the sex is
    LengthPhone AS LONG 'track how long the phone is
END TYPE
 
DEFLNG A-Z
DIM SHARED Record AS RecordType, Index AS IndexType
DIM SHARED RecordNumber, RecordCount
 
OPEN "Demo.dba" FOR BINARY AS #1 'the demo database
OPEN "Demo.ndx" FOR BINARY AS #2 'the demo index
RecordCount = LOF(2) \ LEN(Index)
 
 
DO
    choice = ShowOptions
    SELECT CASE choice
        CASE 1: AddRecord
        CASE 2:
        CASE 3:
        CASE 4: RecordNumber = RecordNumber - 1: IF RecordNumber < 1 THEN RecordNumber = RecordCount
        CASE 5: RecordNumber = RecordNumber + 1: IF RecordNumber > RecordCount THEN RecordNumber = 1
        CASE 6: SYSTEM
    END SELECT
LOOP
 
SUB ShowMainInfo
    CLS
    IF RecordNumber > 0 THEN 'Get the current record and display it
        GET #2, (RecordNumber - 1) * LEN(Index) + 1, Index
        Record.Name = SPACE$(Index.LengthName)
        Record.Sex = SPACE$(Index.LengthSex)
        Record.Phone = SPACE$(Index.LengthPhone)
        GET #1, Index.StartPosition, Record.Name
        GET #1, , Record.Age
        GET #1, , Record.Sex
        GET #1, , Record.Phone
    ELSE
        Record.Name = ""
        Record.Age = 0
        Record.Sex = ""
        Record.Phone = ""
    END IF
 
 
 
    PRINT "Steve's Variable Length Database Demo"
    PRINT
    PRINT "Record RECORD "; RecordNumber; " of "; RecordCount
    PRINT "Name : "; Record.Name
    PRINT "Age  : "; Record.Age
    PRINT "Sex  : "; Record.Sex
    PRINT "Phone: "; Record.Phone
 
    PRINT
    PRINT
END SUB
 
SUB AddRecord
    RecordNumber = 0 'Display a blank record
    ShowMainInfo
    RecordCount = RecordCount + 1 'increase our total count of records
    RecordNumber = RecordCount 'And set our current record to the new record count value
    PRINT "ENTER Name : "
    PRINT "ENTER Age  : "
    PRINT "ENTER Sex  : "
    PRINT "ENTER Phone: "
 
    LOCATE 10, 14: INPUT ; ""; Record.Name
    LOCATE 11, 14: INPUT ; ""; Record.Age
    LOCATE 12, 14: INPUT ; ""; Record.Sex
    LOCATE 13, 14: INPUT ; ""; Record.Phone
    filesize = LEN(Record.Name) + LEN(Record.Age) + LEN(Record.Sex) + LEN(Record.Phone)
    Index.StartPosition = LOF(1) + 1
    Index.LengthName = LEN(Record.Name)
    Index.LengthSex = LEN(Record.Sex)
    Index.LengthPhone = LEN(Record.Phone)
    PUT #2, (RecordCount - 1) * LEN(Index) + 1, Index
    t$ = Record.Name: PUT #1, LOF(1) + 1, t$ 'We must use a temp string, as we can't put a variable length string type to a file
    PUT #1, , Record.Age
    t$ = Record.Sex: PUT #1, , t$
    t$ = Record.Phone: PUT #1, , t$
END SUB
 
 
 
FUNCTION ShowOptions
    ShowMainInfo
    PRINT "1) Add Record"
    PRINT "2) Delete Record Record (Not Implemented Yet)"
    PRINT "3) Edit Record Record (Not Implemented Yet)"
    PRINT "4) Previous Record"
    PRINT "5) Next Record"
    PRINT "6) Quit"
    PRINT
    PRINT
    DO
        i$ = INPUT$(1)
        SELECT CASE i$
            CASE "1" TO "6": ShowOptions = VAL(i$): EXIT FUNCTION
        END SELECT
    LOOP
END FUNCTION
 

Folks have recently been talking about how to make databases with BINARY vs RANDOM access, and somebody brought up how they'd manage variable length strings with a database, using line terminations and parsing... (I think it might have been bplus who mentioned that method.)

Here's how I generally work with handling variable length strings with a database.

For each variable length database, I usually use two databases -- one for the data, and one for an index to the data, which is what I'm doing with the above. (Though sometimes, I'll pack both files into one database, with the index being a set positional header, and the data coming after that header -- but I thought I'd show the simplest form of the process first.)

Now, before I let the demo get too complicated that it might turn folks off from looking at it, I'm just going to post the bare bones of the process first. The code above basically doesn't do anything except allow us to ADD RECORDS, and browse those records sequentially -- but it does show how we'd GET/PUT our information, and track where all that information is while on a disk for us.

RecordNumber is the current record that we're looking at
RecordCount is the total number of records which our database contains.

"Demo.dba" is the demo database
"Demo.ndx" is the demo index

In AddRecord, you can see where we get the information from the user and how we put the proper information onto the drive for us, so we can access it later, and in ShowMainInfo, you can see the process by which we get that information back for us.

Honestly, I don't think there's anything very complicated about what we're doing here, so I really don't know what I need to comment on, or what questions someone might have about the process. If anyone has any specific questions, feel free to ask, and I'll happily answer them, but the process is really very simple:

One file is the user's data, the other file tracks each record's position and lengths inside that file, so we only retrieve and work with what we want, when we want it.

A simple database is included below, but you can freely ignore it if you want. Just run the code above and add your own records and browse them all you want. ;)

SpriggsySpriggs · « **Reply #1 on:** October 02, 2020, 12:57:32 pm »

I've never used a database like that so would there be any benefit over using a MySQL database other than connectivity reliance?

SMcNeill · « **Reply #2 on:** October 02, 2020, 01:27:30 pm »

Quote from: SpriggsySpriggs on October 02, 2020, 12:57:32 pm

I've never used a database like that so would there be any benefit over using a MySQL database other than connectivity reliance?

The main benefits are you get to use variable length strings, and you can still access each record directly, no matter where it’s located at inside your data file.

Record 1 might start as byte 1, go to byte 20.
Record 2 might start at byte 21, go to 50
Record 3 might start at byte 51, go to 100...

Now, if you edit Record 2, and it becomes 40 bytes worth of information, it’ll no longer fit in the same 30 bytes it utilized before. Now, you just shuffle it to the endof the database, like so:

Record 1 — start position 1, end position 20
Record 2 — start position 101, end position 140
Record 3 — start position 51, end position 100

Of course, now you have unreferenced data in bytes 21 to 50, so at some point you’d want to pack and reorganize your database, when it’s inactive and not in use, to get rid of that. (Or have new records check for unreferenced gaps, and see if any is long enough to hold your new data.)

File sizes tend to be as small as possible, and still allow direct access, but you have to plan to handle the extra complexity of maintaining both the database and its index. ;)

SMcNeill · « **Reply #3 on:** October 05, 2020, 11:48:45 am »

A small update to this little routine, now that I'm back home and have a little time to work on it again.

Code: QB64: [Select]

'Random length string database creation.
'This demo will utilize two different files to manage our database.
'the first one will be the data, and the second will be our index to the data
 
TYPE RecordType
    NAME AS STRING
    Age AS _BYTE
    Sex AS STRING
    Phone AS STRING
END TYPE
 
TYPE IndexType
    Valid AS _BYTE
    StartPosition AS LONG
    LengthName AS LONG 'track how long the name is
    LengthSex AS LONG 'track how long the sex is
    LengthPhone AS LONG 'track how long the phone is
END TYPE
 
DEFLNG A-Z
DIM SHARED Record AS RecordType, Index AS IndexType
DIM SHARED RecordNumber, RecordCount
 
OPEN "Demo.dba" FOR BINARY AS #1 'the demo database
OPEN "Demo.ndx" FOR BINARY AS #2 'the demo index
RecordCount = LOF(2) \ LEN(Index)
 
 
DO
    choice = ShowOptions
    SELECT CASE choice
        CASE 1: AddRecord
        CASE 2: DeleteRecord
        CASE 3: UnDeleteRecord
        CASE 4:
        CASE 5: RecordNumber = RecordNumber - 1: IF RecordNumber < 1 THEN RecordNumber = RecordCount
        CASE 6: RecordNumber = RecordNumber + 1: IF RecordNumber > RecordCount THEN RecordNumber = 1
        CASE 7: SYSTEM
    END SELECT
LOOP
 
SUB ShowMainInfo
    CLS
    IF RecordNumber > 0 AND RecordNumber <= RecordCount THEN 'Get the current record and display it
        GET #2, (RecordNumber - 1) * LEN(Index) + 1, Index
        Record.NAME = SPACE$(Index.LengthName)
        Record.Sex = SPACE$(Index.LengthSex)
        Record.Phone = SPACE$(Index.LengthPhone)
        GET #1, Index.StartPosition, Record.NAME
        GET #1, , Record.Age
        GET #1, , Record.Sex
        GET #1, , Record.Phone
    ELSE
        Record.NAME = ""
        Record.Age = 0
        Record.Sex = ""
        Record.Phone = ""
    END IF
 
    IF NOT Index.Valid THEN
        Record.NAME = "DELETED RECORD"
        Record.Age = 0
        Record.Sex = ""
        Record.Phone = ""
    END IF
 
    PRINT "Steve's Variable Length Database Demo"
    PRINT
    PRINT "Record RECORD "; RecordNumber; " of "; RecordCount
    PRINT "Name : ";
    IF NOT Index.Valid THEN COLOR _RGB(255, 0, 0)
    PRINT Record.NAME
    COLOR _RGB(255, 255, 255)
    PRINT "Age  : "; Record.Age
    PRINT "Sex  : "; Record.Sex
    PRINT "Phone: "; Record.Phone
    PRINT
    PRINT
END SUB
 
SUB AddRecord
    RecordNumber = 0 'Display a blank record
    ShowMainInfo
    RecordCount = RecordCount + 1 'increase our total count of records
    RecordNumber = RecordCount 'And set our current record to the new record count value
    PRINT "ENTER Name : "
    PRINT "ENTER Age  : "
    PRINT "ENTER Sex  : "
    PRINT "ENTER Phone: "
 
    LOCATE 10, 14: INPUT ; ""; Record.NAME
    LOCATE 11, 14: INPUT ; ""; Record.Age
    LOCATE 12, 14: INPUT ; ""; Record.Sex
    LOCATE 13, 14: INPUT ; ""; Record.Phone
    filesize = LEN(Record.NAME) + LEN(Record.Age) + LEN(Record.Sex) + LEN(Record.Phone)
    Index.Valid = -1
    Index.StartPosition = LOF(1) + 1
    Index.LengthName = LEN(Record.NAME)
    Index.LengthSex = LEN(Record.Sex)
    Index.LengthPhone = LEN(Record.Phone)
    PUT #2, (RecordCount - 1) * LEN(Index) + 1, Index
    t$ = Record.NAME: PUT #1, LOF(1) + 1, t$ 'We must use a temp string, as we can't put a variable length string type to a file
    PUT #1, , Record.Age
    t$ = Record.Sex: PUT #1, , t$
    t$ = Record.Phone: PUT #1, , t$
END SUB
 
SUB DeleteRecord
    Index.Valid = 0
    PUT #2, (RecordNumber - 1) * LEN(Index) + 1, Index
END SUB
 
SUB UnDeleteRecord
    Index.Valid = -1
    PUT #2, (RecordNumber - 1) * LEN(Index) + 1, Index
END SUB
 
 
 
FUNCTION ShowOptions
    ShowMainInfo
    PRINT "1) Add Record"
    PRINT "2) Delete Current Record"
    PRINT "3) Undelete Current Record"
    PRINT "4) Edit Current Record (Not Implemented Yet)"
    PRINT "5) Previous Record"
    PRINT "6) Next Record"
    PRINT "7) Quit"
    PRINT
    PRINT
    DO
        i$ = INPUT$(1)
        SELECT CASE i$
            CASE "1" TO "7": ShowOptions = VAL(i$): EXIT FUNCTION
        END SELECT
    LOOP
END FUNCTION
 

Now, if you look at my menu, you'll see that I've now got two new options up and going for us: DELETE a record, and UNDELETE a record.

Yep, that's right! Not only can you delete a record, but you can also undelete that record, in case you accidently purge something you didn't really want to, out of your database. Personally, as someone who has worked with various data structures and databases over the last thirty+ years, I really and truly wish that ALL databases were forced to be written like this. Personally, if I hired somebody to write me a custom database, and it didn't have an undelete option, I'd fire the BEEEPER so fast his office chair would create tornado-anime wind effects from his head spinning so fast!

The basic concept here is sooooooo simple, I honestly don't understand why we never see it in use, unless it's in a "professional" database program/format. Simply add one extra byte to your data type, and use it to track if the data is valid, or deleted. In this case, I've simply added that extra byte to my index.

The delete/undelete subs are just this simple:

Code: QB64: [Select]

SUB DeleteRecord
    Index.Valid = 0
    PUT #2, (RecordNumber - 1) * LEN(Index) + 1, Index
END SUB
 
SUB UnDeleteRecord
    Index.Valid = -1
    PUT #2, (RecordNumber - 1) * LEN(Index) + 1, Index
END SUB

Now, when going this route, you'll also want to add one more function to your database at a later date -- PURGE database. Think of the whole process as basically Delete sends the record to the Recycle Bin, and Purge empties the Recycle Bin. It makes deleting of records into a two-step process, which really helps eliminate human error and makes maintaining the database easier and more efficient. OF course, people are idiots, and even if you had a 57-step verification process, folks would blaze through it and still screw up and delete stuff they shouldn't, but using a method like this prevents the "DAMMIT, I HIT THE WRONG KEY AND JUST LOST MR JOHNSON'S CONTACT INFORMATION AND SHIT!!!!!!!!!"

Note: As my data structure has changed slightly, the original database and index from the original post no longer work with these. Normally, I'd code a database with the intention of being future compatible into it, so as to not have that issue, but I'm hoping to highlight one little database process at a time here, as I create these little examples for people to look at and study. Either grab the new files below (they only have 2 quick entries for demo purposes), or just take 30 seconds and create yourself a few records of your own to play around with.

As of now, you can:

Add an entry
Delete an entry
Undelete an entry
Move forward and backwards between entries.

Next will be:
Editing an entry
Packing the database

Like before, I'll wait about a week or so, so folks can post any questions or comments they might have on anything up to this point, before I add the next little "feature" into this series of demos. Feel free to speak up and ask anything that interests you, or any questions you might have, and I'll be happy to answer them for you. ;)

SMcNeill · « **Reply #4 on:** October 05, 2020, 05:31:30 pm »

So, I fibbed. I was bored here alone, and went ahead and finished up the Edit Record routine, without waiting for a week to do so. Sue me, or else wait a week to read look over this demo. ;D

Code: QB64: [Select]

'Random length string database creation.
'This demo will utilize two different files to manage our database.
'the first one will be the data, and the second will be our index to the data
 
TYPE RecordType
    NAME AS STRING
    Age AS _BYTE
    Sex AS STRING
    Phone AS STRING
END TYPE
 
TYPE IndexType
    Valid AS _BYTE
    StartPosition AS LONG
    LengthName AS LONG 'track how long the name is
    LengthSex AS LONG 'track how long the sex is
    LengthPhone AS LONG 'track how long the phone is
END TYPE
 
DEFLNG A-Z
DIM SHARED Record AS RecordType, Index AS IndexType
DIM SHARED RecordNumber, RecordCount
 
OPEN "Demo.dba" FOR BINARY AS #1 'the demo database
OPEN "Demo.ndx" FOR BINARY AS #2 'the demo index
RecordCount = LOF(2) \ LEN(Index)
RecordNumber = 1
 
DO
    choice = ShowOptions
    SELECT CASE choice
        CASE 1: AddRecord
        CASE 2: DeleteRecord
        CASE 3: UnDeleteRecord
        CASE 4: EditRecord
        CASE 5: RecordNumber = RecordNumber - 1: IF RecordNumber < 1 THEN RecordNumber = RecordCount
        CASE 6: RecordNumber = RecordNumber + 1: IF RecordNumber > RecordCount THEN RecordNumber = 1
        CASE 7: SYSTEM
    END SELECT
LOOP
 
SUB ShowMainInfo
    CLS
    COLOR _RGB(255, 255, 255)
    IF RecordNumber > 0 AND RecordNumber <= RecordCount THEN 'Get the current record and display it
        GET #2, (RecordNumber - 1) * LEN(Index) + 1, Index
        IF Index.Valid THEN
            Record.NAME = SPACE$(Index.LengthName)
            Record.Sex = SPACE$(Index.LengthSex)
            Record.Phone = SPACE$(Index.LengthPhone)
            GET #1, Index.StartPosition, Record.NAME
            GET #1, , Record.Age
            GET #1, , Record.Sex
            GET #1, , Record.Phone
        ELSE
            Record.NAME = "DELETED RECORD"
            Record.Age = 0
            Record.Sex = ""
            Record.Phone = ""
        END IF
    ELSE
        Record.NAME = "NO RECORD"
        Record.Age = 0
        Record.Sex = ""
        Record.Phone = ""
    END IF
 
    PRINT "Steve's Variable Length Database Demo"
    PRINT
    PRINT "Record Number "; RecordNumber; " of "; RecordCount
    PRINT "Name : ";
    IF NOT Index.Valid THEN COLOR _RGB(255, 0, 0)
    PRINT Record.NAME
    COLOR _RGB(255, 255, 255)
    PRINT "Age  : "; Record.Age
    PRINT "Sex  : "; Record.Sex
    PRINT "Phone: "; Record.Phone
    PRINT
    PRINT
END SUB
 
SUB AddRecord
    RecordNumber = 0 'Display a blank record
    ShowMainInfo
    RecordCount = RecordCount + 1 'increase our total count of records
    RecordNumber = RecordCount 'And set our current record to the new record count value
    PRINT "ENTER Name : "
    PRINT "ENTER Age  : "
    PRINT "ENTER Sex  : "
    PRINT "ENTER Phone: "
 
    LOCATE 10, 14: INPUT "", Record.NAME
    LOCATE 11, 14: INPUT "", Record.Age
    LOCATE 12, 14: INPUT "", Record.Sex
    LOCATE 13, 14: INPUT "", Record.Phone
    filesize = LEN(Record.NAME) + LEN(Record.Age) + LEN(Record.Sex) + LEN(Record.Phone)
    Index.Valid = -1
    Index.StartPosition = LOF(1) + 1
    Index.LengthName = LEN(Record.NAME)
    Index.LengthSex = LEN(Record.Sex)
    Index.LengthPhone = LEN(Record.Phone)
    PUT #2, (RecordCount - 1) * LEN(Index) + 1, Index
    t$ = Record.NAME: PUT #1, Index.StartPosition, t$ 'We must use a temp string, as we can't put a variable length string type to a file
    PUT #1, , Record.Age
    t$ = Record.Sex: PUT #1, , t$
    t$ = Record.Phone: PUT #1, , t$
END SUB
 
SUB EditRecord
    CLS
    ShowMainInfo
    oldfilesize = LEN(Record.NAME) + LEN(Record.Age) + LEN(Record.Sex) + LEN(Record.Phone)
    LOCATE 10, 1: PRINT "ENTER Name : "
    LOCATE 11, 1: PRINT "ENTER Age  : "
    LOCATE 12, 1: PRINT "ENTER Sex  : "
    LOCATE 13, 1: PRINT "ENTER Phone: "
 
    LOCATE 10, 14: INPUT "", Record.NAME
    LOCATE 11, 14: INPUT "", Record.Age
    LOCATE 12, 14: INPUT "", Record.Sex
    LOCATE 13, 14: INPUT "", Record.Phone
    filesize = LEN(Record.NAME) + LEN(Record.Age) + LEN(Record.Sex) + LEN(Record.Phone)
    Index.Valid = -1
    IF filesize > oldfilesize THEN 'if our edit is larger than our old data
        Index.StartPosition = LOF(1) + 1 'we have to put it at the end of the existing datafile
    END IF 'otherwise,we just put it where it currently exists
    Index.LengthName = LEN(Record.NAME)
    Index.LengthSex = LEN(Record.Sex)
    Index.LengthPhone = LEN(Record.Phone)
    PUT #2, (RecordNumber - 1) * LEN(Index) + 1, Index
    t$ = Record.NAME: PUT #1, Index.StartPosition, t$ 'We must use a temp string, as we can't put a variable length string type to a file
    PUT #1, , Record.Age
    t$ = Record.Sex: PUT #1, , t$
    t$ = Record.Phone: PUT #1, , t$
END SUB
 
 
 
 
 
SUB DeleteRecord
    Index.Valid = 0
    PUT #2, (RecordNumber - 1) * LEN(Index) + 1, Index
END SUB
 
SUB UnDeleteRecord
    Index.Valid = -1
    PUT #2, (RecordNumber - 1) * LEN(Index) + 1, Index
END SUB
 
 
 
FUNCTION ShowOptions
    ShowMainInfo
    PRINT "1) Add Record"
    PRINT "2) Delete Current Record"
    PRINT "3) Undelete Current Record"
    PRINT "4) Edit Current Record"
    PRINT "5) Previous Record"
    PRINT "6) Next Record"
    PRINT "7) Quit"
    PRINT
    PRINT
    DO
        i$ = INPUT$(1)
        SELECT CASE i$
            CASE "1" TO "7": ShowOptions = VAL(i$): EXIT FUNCTION
        END SELECT
    LOOP
END FUNCTION
 

Now, since we have variable length strings, we can't just always replace the old data with the new data. What if what we're writing is longer than what we wrote before?

For example, what if we had a phone number of 555-1234, and we wanted to edit it to include the area code 555-555-1234. We certainly couldn't just toss those 12 bytes into the same space which only contained 8 bytes of information before... So what do we do??

Simplest solution is to just work it like a new record and put the information at the end of the existing database, and point our index to it there, and that's what I'm doing in the code here.

Code: QB64: [Select]

    IF filesize > oldfilesize THEN 'if our edit is larger than our old data
        Index.StartPosition = LOF(1) + 1 'we have to put it at the end of the existing datafile
    END IF 'otherwise,we just put it where it currently exists

Now, if you take a moment to think about it, this will obviously create a database with unreferenced data.

For example, let's say I had the following for data: AppleBananaCarrot

Now, if I want to replace that Banana with Beet, I can just swap it out where it exists: AppleBeetnaCarrot

Now, even in this scenario, we now have 2 bytes in that data that we're no longer using -- the last "na" in Banana, which Beet just left behind without touching...

But, what if we wanted to replace that "Banana" with "Bell Pepper"? We can't just easily put it where the Banana was, so in this case we just tack it to the end of the original data and point the 2nd index to it there: AppleBananaCarrotBell Pepper

Originally, our indexes were: 1, 6, 12 (Where the A, B, C started for each data entry)
Now, our indexes are 1, 17, 12 (Which is where the A, B in the Bell Pepper, and C all start at in AppleBananaCarrotBell Pepper)

Since we tacked Bell Pepper to the end of our data, we now have a segment in there which is just stray, unreferenced information: AppleBananaCarrotBell Pepper

Nothing points to it. Nothing uses it. It's just extra, left over JUNK in our database....

...Which is why we'd want to include one more option in our main menu, for the user to PURGE that junk and compact that database down from AppleBananaCarrotBell Pepper to AppleCarrotBell Pepper. But, I'll save that little bit of coding demo for later, so folks can focus on the edit process and tracking the positioning of data in our databases for now.

Any questions? Comments? Insights? Feel free to share them below, and I'll answer them as time and life allows. ;D

Dimster · « **Reply #5 on:** October 06, 2020, 09:41:55 am »

Steve - would you be able to run that edit by me one more time? If I had a field carrying data "AppleBananaCarrot" and wanted to change this data to "AppleBeetCarrot" - assuming I'm typing the complete new data for that field, why would the "na" re-appear in the field's data?

SMcNeill · « **Reply #6 on:** October 06, 2020, 10:25:23 am »

Quote from: Dimster on October 06, 2020, 09:41:55 am

Steve - would you be able to run that edit by me one more time? If I had a field carrying data "AppleBananaCarrot" and wanted to change this data to "AppleBeetCarrot" - assuming I'm typing the complete new data for that field, why would the "na" re-appear in the field's data?

Certainly. :)

Now, our original data looked like AppleBananaCarrot.

Our original index would look like
Record 1: Start 1, Length 5
Record 2: Start 6, Length 6
Record 3: Start 12, Length 6

With this information, we can pick any record which we want, and retrieve the variable length data that it contains. For the 2nd record, we start at byte number 6, and retrieve 6 bytes of data: "Banana". Without these two pieces of information, how do we know how long our data is? How do we know this isn't one piece of information on an AppleBananaCarrot smoothie? Our index tracks both the start, and the length of our data fields for us.

Now, if we end up editing and replacing "Banana" with "Beet", we want to make the changes to the database as simple as possible. Since "Beet" is less than, or equal to, the length of "Banana", we can just swap its data directly in place of the old, and then update our index.

Our Edited Data now looks like: AppleBeetnaCarrot

And our edited Index now looks like:
Record 1: Start 1, Length 5
Record 2: Start 6, Length 4
Record 3: Start 12, Length 6
Record 4+: All remains the same

With the BOLD ITALICS above, you can see that we've made the minimal amount of changes possible to preserve our data, and keep all the information referenced properly.

Now, you *could* do as you suggest, and pack the database as you go along, but that's actually going to be a lot of processing to do

AppleBeetCarrot <-- Let's say you want to make this your data base, so that "na" isn't in there any longer.

You'd have to recalculate the index completely, just to make that one single change:
Record 1: Start 1, Length 5
Record 2: Start 6, Length 4
Record 3: Start 10, Length 6
Record 4+: All need their Start references updated, Length will remain the same

Notice with the italics here, how much you're changing, and reindexing? Now, for a database of 3 records, there's no worry over it. You'll be finished in microseconds. But what if we suddenly are working with a database with 1,000,000,000 records?? Do you *really* want to rebuild that index and pack that database with each and every change to the dataset? Or do you think it might be better to just leave some junk in it (like that lost "na" from banana), and then schedule a purge/packing of the data for 2AM on Sunday, when the company is closed and nobody is using the database?

It's not that the "na" is re-appearing in that field's data. It's simply that you can put, at most, 6-bytes of information in the same place where "banana" was stored in the dataset. Since you're only putting 4-bytes there, there's no reason to mess with the other 2 at all, at this point in time. (You're changing "Bana" for "Beet".) Your new index won't reference those other 2-bytes (the "na"), and when you rebuild/pack the database in the future, you'll get rid of that unreferenced data, and your database *will* end up looking like you suggest "AppleBeetCarrot". You're simply ignoring them completely, for the moment, and just making the minimal changes to your index and data, to store your new information, without having to alter more than a single record at the time. Packing and cleaning the database can be done at a later time, to save a little disk space on your machines. ;)

The whole point is to make your edits, and store your data, while only affecting a single index/record at a time. If you were using a RANDOM access file, you certainly don't want to have to redo the whole file just because you changed a single record, and it's the same here with a variable-length data field. You don't want to have to move and change the whole dang database, just to update a single record -- you only want to make changes which impact on that one record, and that record only.

As long as your edit is less than, or equal to the old data, you can just update the data and the length index, and be done with it.
If it's longer than the old data, you then simply tack it to the end of your existing data, and point the start and length index to that new position, and be done with it.

Both options keep you only making changes to a single record/index, but both will also leave stray, unreferenced data behind. Going this route, you'll want to manually clean up the database at some later point in time, which won't affect user productivity any.

(And all you do then is basically just read your records, and write them to a new database/index, in order -- then delete or archive the old set, and rename the new set to take its place in your program. Personally, I prefer to archive the old datasets, just in case the drive ever gets corrupted, or something unforeseen happens like your intern deletes the whole thing on you, while trying to show off...)

Dimster · « **Reply #7 on:** October 06, 2020, 10:59:35 am »

Thanks Steve

News:

Author Topic: Variable length string database, using an index file (Read 6708 times)

SMcNeill

Variable length string database, using an index file

SpriggsySpriggs

Re: Variable length string database, using an index file

SMcNeill

Re: Variable length string database, using an index file

SMcNeill

Re: Variable length string database, using an index file

SMcNeill

Re: Variable length string database, using an index file

Dimster

Re: Variable length string database, using an index file

SMcNeill

Re: Variable length string database, using an index file

Dimster

Re: Variable length string database, using an index file