Author Topic: Variable length string database, using an index file  (Read 4357 times)

0 Members and 1 Guest are viewing this topic.

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
Variable length string database, using an index file
« on: October 02, 2020, 12:19:40 pm »
Code: QB64: [Select]
  1. 'Random length string database creation.
  2. 'This demo will utilize two different files to manage our database.
  3. 'the first one will be the data, and the second will be our index to the data
  4.  
  5. TYPE RecordType
  6.     Name AS STRING
  7.     Age AS _BYTE
  8.     Sex AS STRING
  9.     Phone AS STRING
  10.  
  11. TYPE IndexType
  12.     StartPosition AS LONG
  13.     LengthName AS LONG 'track how long the name is
  14.     LengthSex AS LONG 'track how long the sex is
  15.     LengthPhone AS LONG 'track how long the phone is
  16.  
  17. DEFLNG A-Z
  18. DIM SHARED Record AS RecordType, Index AS IndexType
  19. DIM SHARED RecordNumber, RecordCount
  20.  
  21. OPEN "Demo.dba" FOR BINARY AS #1 'the demo database
  22. OPEN "Demo.ndx" FOR BINARY AS #2 'the demo index
  23. RecordCount = LOF(2) \ LEN(Index)
  24.  
  25.  
  26.     choice = ShowOptions
  27.     SELECT CASE choice
  28.         CASE 1: AddRecord
  29.         CASE 2:
  30.         CASE 3:
  31.         CASE 4: RecordNumber = RecordNumber - 1: IF RecordNumber < 1 THEN RecordNumber = RecordCount
  32.         CASE 5: RecordNumber = RecordNumber + 1: IF RecordNumber > RecordCount THEN RecordNumber = 1
  33.         CASE 6: SYSTEM
  34.     END SELECT
  35.  
  36. SUB ShowMainInfo
  37.     CLS
  38.     IF RecordNumber > 0 THEN 'Get the current record and display it
  39.         GET #2, (RecordNumber - 1) * LEN(Index) + 1, Index
  40.         Record.Name = SPACE$(Index.LengthName)
  41.         Record.Sex = SPACE$(Index.LengthSex)
  42.         Record.Phone = SPACE$(Index.LengthPhone)
  43.         GET #1, Index.StartPosition, Record.Name
  44.         GET #1, , Record.Age
  45.         GET #1, , Record.Sex
  46.         GET #1, , Record.Phone
  47.     ELSE
  48.         Record.Name = ""
  49.         Record.Age = 0
  50.         Record.Sex = ""
  51.         Record.Phone = ""
  52.     END IF
  53.  
  54.  
  55.  
  56.     PRINT "Steve's Variable Length Database Demo"
  57.     PRINT
  58.     PRINT "Record RECORD "; RecordNumber; " of "; RecordCount
  59.     PRINT "Name : "; Record.Name
  60.     PRINT "Age  : "; Record.Age
  61.     PRINT "Sex  : "; Record.Sex
  62.     PRINT "Phone: "; Record.Phone
  63.  
  64.     PRINT
  65.     PRINT
  66.  
  67. SUB AddRecord
  68.     RecordNumber = 0 'Display a blank record
  69.     ShowMainInfo
  70.     RecordCount = RecordCount + 1 'increase our total count of records
  71.     RecordNumber = RecordCount 'And set our current record to the new record count value
  72.     PRINT "ENTER Name : "
  73.     PRINT "ENTER Age  : "
  74.     PRINT "ENTER Sex  : "
  75.     PRINT "ENTER Phone: "
  76.  
  77.     LOCATE 10, 14: INPUT ; ""; Record.Name
  78.     LOCATE 11, 14: INPUT ; ""; Record.Age
  79.     LOCATE 12, 14: INPUT ; ""; Record.Sex
  80.     LOCATE 13, 14: INPUT ; ""; Record.Phone
  81.     filesize = LEN(Record.Name) + LEN(Record.Age) + LEN(Record.Sex) + LEN(Record.Phone)
  82.     Index.StartPosition = LOF(1) + 1
  83.     Index.LengthName = LEN(Record.Name)
  84.     Index.LengthSex = LEN(Record.Sex)
  85.     Index.LengthPhone = LEN(Record.Phone)
  86.     PUT #2, (RecordCount - 1) * LEN(Index) + 1, Index
  87.     t$ = Record.Name: PUT #1, LOF(1) + 1, t$ 'We must use a temp string, as we can't put a variable length string type to a file
  88.     PUT #1, , Record.Age
  89.     t$ = Record.Sex: PUT #1, , t$
  90.     t$ = Record.Phone: PUT #1, , t$
  91.  
  92.  
  93.  
  94. FUNCTION ShowOptions
  95.     ShowMainInfo
  96.     PRINT "1) Add Record"
  97.     PRINT "2) Delete Record Record (Not Implemented Yet)"
  98.     PRINT "3) Edit Record Record (Not Implemented Yet)"
  99.     PRINT "4) Previous Record"
  100.     PRINT "5) Next Record"
  101.     PRINT "6) Quit"
  102.     PRINT
  103.     PRINT
  104.     DO
  105.         i$ = INPUT$(1)
  106.         SELECT CASE i$
  107.             CASE "1" TO "6": ShowOptions = VAL(i$): EXIT FUNCTION
  108.         END SELECT
  109.     LOOP
  110.  

Folks have recently been talking about how to make databases with BINARY vs RANDOM access, and somebody brought up how they'd manage variable length strings with a database, using line terminations and parsing...  (I think it might have been bplus who mentioned that method.)

Here's how I generally work with handling variable length strings with a database.

For each variable length database, I usually use two databases -- one for the data, and one for an index to the data, which is what I'm doing with the above.  (Though sometimes, I'll pack both files into one database, with the index being a set positional header, and the data coming after that header -- but I thought I'd show the simplest form of the process first.)

Now, before I let the demo get too complicated that it might turn folks off from looking at it, I'm just going to post the bare bones of the process first.  The code above basically doesn't do anything except allow us to ADD RECORDS, and browse those records sequentially -- but it does show how we'd GET/PUT our information, and track where all that information is while on a disk for us.

RecordNumber is the current record that we're looking at
RecordCount is the total number of records which our database contains.

"Demo.dba" is the demo database
"Demo.ndx" is the demo index

In  AddRecord, you can see where we get the information from the user and how we put the proper information onto the drive for us, so we can access it later, and in ShowMainInfo, you can see the process by which we get that information back for us.



Honestly, I don't think there's anything very complicated about what we're doing here, so I really don't know what I need to comment on, or what questions someone might have about the process.  If anyone has any specific questions, feel free to ask, and I'll happily answer them, but the process is really very simple:

One file is the user's data, the other file tracks each record's position and lengths inside that file, so we only retrieve and work with what we want, when we want it.

A simple database is included below, but you can freely ignore it if you want.  Just run the code above and add your own records and browse them all you want.  ;)


* Demo.dba (Filesize: 0.11 KB, Downloads: 219)
* Demo.ndx (Filesize: 0.03 KB, Downloads: 214)
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline SpriggsySpriggs

  • Forum Resident
  • Posts: 1145
  • Larger than life
    • View Profile
    • GitHub
Re: Variable length string database, using an index file
« Reply #1 on: October 02, 2020, 12:57:32 pm »
I've never used a database like that so would there be any benefit over using a MySQL database other than connectivity reliance?
Shuwatch!

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
Re: Variable length string database, using an index file
« Reply #2 on: October 02, 2020, 01:27:30 pm »
I've never used a database like that so would there be any benefit over using a MySQL database other than connectivity reliance?

The main benefits are you get to use variable length strings, and you can still access each record directly, no matter where it’s located at inside your data file.

Record 1 might start as byte 1, go to byte 20.
Record 2 might start at byte 21, go to 50
Record 3 might start at byte 51, go to 100...

Now, if you edit Record 2, and it becomes 40 bytes worth of information, it’ll no longer fit in the same 30 bytes it utilized before.  Now, you just shuffle it to the endof the database, like so:

Record 1 — start position 1, end position 20
Record 2 — start position 101, end position 140
Record 3 — start position 51, end position 100

Of course, now you have unreferenced data in bytes 21 to 50, so at some point you’d want to pack and reorganize your database, when it’s inactive and not in use, to get rid of that.  (Or have new records check for unreferenced gaps, and see if any is long enough to hold your new data.)

File sizes tend to be as small as possible, and still allow direct access, but you have to plan to handle the extra complexity of maintaining both the database and its index.  ;)
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
Re: Variable length string database, using an index file
« Reply #3 on: October 05, 2020, 11:48:45 am »
A small update to this little routine, now that I'm back home and have a little time to work on it again.

Code: QB64: [Select]
  1. 'Random length string database creation.
  2. 'This demo will utilize two different files to manage our database.
  3. 'the first one will be the data, and the second will be our index to the data
  4.  
  5. TYPE RecordType
  6.     NAME AS STRING
  7.     Age AS _BYTE
  8.     Sex AS STRING
  9.     Phone AS STRING
  10.  
  11. TYPE IndexType
  12.     Valid AS _BYTE
  13.     StartPosition AS LONG
  14.     LengthName AS LONG 'track how long the name is
  15.     LengthSex AS LONG 'track how long the sex is
  16.     LengthPhone AS LONG 'track how long the phone is
  17.  
  18. DEFLNG A-Z
  19. DIM SHARED Record AS RecordType, Index AS IndexType
  20. DIM SHARED RecordNumber, RecordCount
  21.  
  22. OPEN "Demo.dba" FOR BINARY AS #1 'the demo database
  23. OPEN "Demo.ndx" FOR BINARY AS #2 'the demo index
  24. RecordCount = LOF(2) \ LEN(Index)
  25.  
  26.  
  27.     choice = ShowOptions
  28.     SELECT CASE choice
  29.         CASE 1: AddRecord
  30.         CASE 2: DeleteRecord
  31.         CASE 3: UnDeleteRecord
  32.         CASE 4:
  33.         CASE 5: RecordNumber = RecordNumber - 1: IF RecordNumber < 1 THEN RecordNumber = RecordCount
  34.         CASE 6: RecordNumber = RecordNumber + 1: IF RecordNumber > RecordCount THEN RecordNumber = 1
  35.         CASE 7: SYSTEM
  36.     END SELECT
  37.  
  38. SUB ShowMainInfo
  39.     CLS
  40.     IF RecordNumber > 0 AND RecordNumber <= RecordCount THEN 'Get the current record and display it
  41.         GET #2, (RecordNumber - 1) * LEN(Index) + 1, Index
  42.         Record.NAME = SPACE$(Index.LengthName)
  43.         Record.Sex = SPACE$(Index.LengthSex)
  44.         Record.Phone = SPACE$(Index.LengthPhone)
  45.         GET #1, Index.StartPosition, Record.NAME
  46.         GET #1, , Record.Age
  47.         GET #1, , Record.Sex
  48.         GET #1, , Record.Phone
  49.     ELSE
  50.         Record.NAME = ""
  51.         Record.Age = 0
  52.         Record.Sex = ""
  53.         Record.Phone = ""
  54.     END IF
  55.  
  56.     IF NOT Index.Valid THEN
  57.         Record.NAME = "DELETED RECORD"
  58.         Record.Age = 0
  59.         Record.Sex = ""
  60.         Record.Phone = ""
  61.     END IF
  62.  
  63.     PRINT "Steve's Variable Length Database Demo"
  64.     PRINT
  65.     PRINT "Record RECORD "; RecordNumber; " of "; RecordCount
  66.     PRINT "Name : ";
  67.     IF NOT Index.Valid THEN COLOR _RGB(255, 0, 0)
  68.     PRINT Record.NAME
  69.     COLOR _RGB(255, 255, 255)
  70.     PRINT "Age  : "; Record.Age
  71.     PRINT "Sex  : "; Record.Sex
  72.     PRINT "Phone: "; Record.Phone
  73.     PRINT
  74.     PRINT
  75.  
  76. SUB AddRecord
  77.     RecordNumber = 0 'Display a blank record
  78.     ShowMainInfo
  79.     RecordCount = RecordCount + 1 'increase our total count of records
  80.     RecordNumber = RecordCount 'And set our current record to the new record count value
  81.     PRINT "ENTER Name : "
  82.     PRINT "ENTER Age  : "
  83.     PRINT "ENTER Sex  : "
  84.     PRINT "ENTER Phone: "
  85.  
  86.     LOCATE 10, 14: INPUT ; ""; Record.NAME
  87.     LOCATE 11, 14: INPUT ; ""; Record.Age
  88.     LOCATE 12, 14: INPUT ; ""; Record.Sex
  89.     LOCATE 13, 14: INPUT ; ""; Record.Phone
  90.     filesize = LEN(Record.NAME) + LEN(Record.Age) + LEN(Record.Sex) + LEN(Record.Phone)
  91.     Index.Valid = -1
  92.     Index.StartPosition = LOF(1) + 1
  93.     Index.LengthName = LEN(Record.NAME)
  94.     Index.LengthSex = LEN(Record.Sex)
  95.     Index.LengthPhone = LEN(Record.Phone)
  96.     PUT #2, (RecordCount - 1) * LEN(Index) + 1, Index
  97.     t$ = Record.NAME: PUT #1, LOF(1) + 1, t$ 'We must use a temp string, as we can't put a variable length string type to a file
  98.     PUT #1, , Record.Age
  99.     t$ = Record.Sex: PUT #1, , t$
  100.     t$ = Record.Phone: PUT #1, , t$
  101.  
  102. SUB DeleteRecord
  103.     Index.Valid = 0
  104.     PUT #2, (RecordNumber - 1) * LEN(Index) + 1, Index
  105.  
  106. SUB UnDeleteRecord
  107.     Index.Valid = -1
  108.     PUT #2, (RecordNumber - 1) * LEN(Index) + 1, Index
  109.  
  110.  
  111.  
  112. FUNCTION ShowOptions
  113.     ShowMainInfo
  114.     PRINT "1) Add Record"
  115.     PRINT "2) Delete Current Record"
  116.     PRINT "3) Undelete Current Record"
  117.     PRINT "4) Edit Current Record (Not Implemented Yet)"
  118.     PRINT "5) Previous Record"
  119.     PRINT "6) Next Record"
  120.     PRINT "7) Quit"
  121.     PRINT
  122.     PRINT
  123.     DO
  124.         i$ = INPUT$(1)
  125.         SELECT CASE i$
  126.             CASE "1" TO "7": ShowOptions = VAL(i$): EXIT FUNCTION
  127.         END SELECT
  128.     LOOP
  129.  

Now, if you look at my menu, you'll see that I've now got two new options up and going for us:  DELETE a record, and UNDELETE a record.

Yep, that's right!  Not only can you delete a record, but you can also undelete that record, in case you accidently purge something you didn't really want to, out of your database.  Personally, as someone who has worked with various data structures and databases over the last thirty+ years, I really and truly wish that ALL databases were forced to be written like this.  Personally, if I hired somebody to write me a custom database, and it didn't have an undelete option, I'd fire the BEEEPER so fast his office chair would create tornado-anime wind effects from his head spinning so fast!

The basic concept here is sooooooo simple, I honestly don't understand why we never see it in use, unless it's in a "professional" database program/format.  Simply add one extra byte to your data type, and use it to track if the data is valid, or deleted.  In this case, I've simply added that extra byte to my index.

The delete/undelete subs are just this simple:

Code: QB64: [Select]
  1. SUB DeleteRecord
  2.     Index.Valid = 0
  3.     PUT #2, (RecordNumber - 1) * LEN(Index) + 1, Index
  4.  
  5. SUB UnDeleteRecord
  6.     Index.Valid = -1
  7.     PUT #2, (RecordNumber - 1) * LEN(Index) + 1, Index

Now, when going this route, you'll also want to add one more function to your database at a later date -- PURGE database.  Think of the whole process as basically Delete sends the record to the Recycle Bin, and Purge empties the Recycle Bin.  It makes deleting of records into a two-step process, which really helps eliminate human error and makes maintaining the database easier and more efficient.  OF course, people are idiots, and even if you had a 57-step verification process, folks would blaze through it and still screw up and delete stuff they shouldn't, but using a method like this prevents the "DAMMIT, I HIT THE WRONG KEY AND JUST LOST MR JOHNSON'S CONTACT INFORMATION AND SHIT!!!!!!!!!"

Note:  As my data structure has changed slightly, the original database and index from the original post no longer work with these.  Normally, I'd code a database with the intention of being future compatible into it, so as to not have that issue, but I'm hoping to highlight one little database process at a time here, as I create these little examples for people to look at and study.  Either grab the new files below (they only have 2 quick entries for demo purposes), or just take 30 seconds and create yourself a few records of your own to play around with.

As of now, you can:

Add an entry
Delete an entry
Undelete an entry
Move forward and backwards between entries.

Next will be:
Editing an entry
Packing the database



Like before, I'll wait about a week or so, so folks can post any questions or comments they might have on anything up to this point, before I add the next little "feature" into this series of demos.  Feel free to speak up and ask anything that interests you, or any questions you might have, and I'll be happy to answer them for you.  ;)
* Demo.dba (Filesize: 0.15 KB, Downloads: 201)
* Demo.ndx (Filesize: 0.05 KB, Downloads: 211)
« Last Edit: October 05, 2020, 05:32:01 pm by SMcNeill »
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
Re: Variable length string database, using an index file
« Reply #4 on: October 05, 2020, 05:31:30 pm »
So, I fibbed.  I was bored here alone, and went ahead and finished up the Edit Record routine, without waiting for a week to do so.  Sue me, or else wait a week to read look over this demo.  ;D

Code: QB64: [Select]
  1. 'Random length string database creation.
  2. 'This demo will utilize two different files to manage our database.
  3. 'the first one will be the data, and the second will be our index to the data
  4.  
  5. TYPE RecordType
  6.     NAME AS STRING
  7.     Age AS _BYTE
  8.     Sex AS STRING
  9.     Phone AS STRING
  10.  
  11. TYPE IndexType
  12.     Valid AS _BYTE
  13.     StartPosition AS LONG
  14.     LengthName AS LONG 'track how long the name is
  15.     LengthSex AS LONG 'track how long the sex is
  16.     LengthPhone AS LONG 'track how long the phone is
  17.  
  18. DEFLNG A-Z
  19. DIM SHARED Record AS RecordType, Index AS IndexType
  20. DIM SHARED RecordNumber, RecordCount
  21.  
  22. OPEN "Demo.dba" FOR BINARY AS #1 'the demo database
  23. OPEN "Demo.ndx" FOR BINARY AS #2 'the demo index
  24. RecordCount = LOF(2) \ LEN(Index)
  25. RecordNumber = 1
  26.  
  27.     choice = ShowOptions
  28.     SELECT CASE choice
  29.         CASE 1: AddRecord
  30.         CASE 2: DeleteRecord
  31.         CASE 3: UnDeleteRecord
  32.         CASE 4: EditRecord
  33.         CASE 5: RecordNumber = RecordNumber - 1: IF RecordNumber < 1 THEN RecordNumber = RecordCount
  34.         CASE 6: RecordNumber = RecordNumber + 1: IF RecordNumber > RecordCount THEN RecordNumber = 1
  35.         CASE 7: SYSTEM
  36.     END SELECT
  37.  
  38. SUB ShowMainInfo
  39.     CLS
  40.     COLOR _RGB(255, 255, 255)
  41.     IF RecordNumber > 0 AND RecordNumber <= RecordCount THEN 'Get the current record and display it
  42.         GET #2, (RecordNumber - 1) * LEN(Index) + 1, Index
  43.         IF Index.Valid THEN
  44.             Record.NAME = SPACE$(Index.LengthName)
  45.             Record.Sex = SPACE$(Index.LengthSex)
  46.             Record.Phone = SPACE$(Index.LengthPhone)
  47.             GET #1, Index.StartPosition, Record.NAME
  48.             GET #1, , Record.Age
  49.             GET #1, , Record.Sex
  50.             GET #1, , Record.Phone
  51.         ELSE
  52.             Record.NAME = "DELETED RECORD"
  53.             Record.Age = 0
  54.             Record.Sex = ""
  55.             Record.Phone = ""
  56.         END IF
  57.     ELSE
  58.         Record.NAME = "NO RECORD"
  59.         Record.Age = 0
  60.         Record.Sex = ""
  61.         Record.Phone = ""
  62.     END IF
  63.  
  64.     PRINT "Steve's Variable Length Database Demo"
  65.     PRINT
  66.     PRINT "Record Number "; RecordNumber; " of "; RecordCount
  67.     PRINT "Name : ";
  68.     IF NOT Index.Valid THEN COLOR _RGB(255, 0, 0)
  69.     PRINT Record.NAME
  70.     COLOR _RGB(255, 255, 255)
  71.     PRINT "Age  : "; Record.Age
  72.     PRINT "Sex  : "; Record.Sex
  73.     PRINT "Phone: "; Record.Phone
  74.     PRINT
  75.     PRINT
  76.  
  77. SUB AddRecord
  78.     RecordNumber = 0 'Display a blank record
  79.     ShowMainInfo
  80.     RecordCount = RecordCount + 1 'increase our total count of records
  81.     RecordNumber = RecordCount 'And set our current record to the new record count value
  82.     PRINT "ENTER Name : "
  83.     PRINT "ENTER Age  : "
  84.     PRINT "ENTER Sex  : "
  85.     PRINT "ENTER Phone: "
  86.  
  87.     LOCATE 10, 14: INPUT "", Record.NAME
  88.     LOCATE 11, 14: INPUT "", Record.Age
  89.     LOCATE 12, 14: INPUT "", Record.Sex
  90.     LOCATE 13, 14: INPUT "", Record.Phone
  91.     filesize = LEN(Record.NAME) + LEN(Record.Age) + LEN(Record.Sex) + LEN(Record.Phone)
  92.     Index.Valid = -1
  93.     Index.StartPosition = LOF(1) + 1
  94.     Index.LengthName = LEN(Record.NAME)
  95.     Index.LengthSex = LEN(Record.Sex)
  96.     Index.LengthPhone = LEN(Record.Phone)
  97.     PUT #2, (RecordCount - 1) * LEN(Index) + 1, Index
  98.     t$ = Record.NAME: PUT #1, Index.StartPosition, t$ 'We must use a temp string, as we can't put a variable length string type to a file
  99.     PUT #1, , Record.Age
  100.     t$ = Record.Sex: PUT #1, , t$
  101.     t$ = Record.Phone: PUT #1, , t$
  102.  
  103. SUB EditRecord
  104.     CLS
  105.     ShowMainInfo
  106.     oldfilesize = LEN(Record.NAME) + LEN(Record.Age) + LEN(Record.Sex) + LEN(Record.Phone)
  107.     LOCATE 10, 1: PRINT "ENTER Name : "
  108.     LOCATE 11, 1: PRINT "ENTER Age  : "
  109.     LOCATE 12, 1: PRINT "ENTER Sex  : "
  110.     LOCATE 13, 1: PRINT "ENTER Phone: "
  111.  
  112.     LOCATE 10, 14: INPUT "", Record.NAME
  113.     LOCATE 11, 14: INPUT "", Record.Age
  114.     LOCATE 12, 14: INPUT "", Record.Sex
  115.     LOCATE 13, 14: INPUT "", Record.Phone
  116.     filesize = LEN(Record.NAME) + LEN(Record.Age) + LEN(Record.Sex) + LEN(Record.Phone)
  117.     Index.Valid = -1
  118.     IF filesize > oldfilesize THEN 'if our edit is larger than our old data
  119.         Index.StartPosition = LOF(1) + 1 'we have to put it at the end of the existing datafile
  120.     END IF 'otherwise,we just put it where it currently exists
  121.     Index.LengthName = LEN(Record.NAME)
  122.     Index.LengthSex = LEN(Record.Sex)
  123.     Index.LengthPhone = LEN(Record.Phone)
  124.     PUT #2, (RecordNumber - 1) * LEN(Index) + 1, Index
  125.     t$ = Record.NAME: PUT #1, Index.StartPosition, t$ 'We must use a temp string, as we can't put a variable length string type to a file
  126.     PUT #1, , Record.Age
  127.     t$ = Record.Sex: PUT #1, , t$
  128.     t$ = Record.Phone: PUT #1, , t$
  129.  
  130.  
  131.  
  132.  
  133.  
  134. SUB DeleteRecord
  135.     Index.Valid = 0
  136.     PUT #2, (RecordNumber - 1) * LEN(Index) + 1, Index
  137.  
  138. SUB UnDeleteRecord
  139.     Index.Valid = -1
  140.     PUT #2, (RecordNumber - 1) * LEN(Index) + 1, Index
  141.  
  142.  
  143.  
  144. FUNCTION ShowOptions
  145.     ShowMainInfo
  146.     PRINT "1) Add Record"
  147.     PRINT "2) Delete Current Record"
  148.     PRINT "3) Undelete Current Record"
  149.     PRINT "4) Edit Current Record"
  150.     PRINT "5) Previous Record"
  151.     PRINT "6) Next Record"
  152.     PRINT "7) Quit"
  153.     PRINT
  154.     PRINT
  155.     DO
  156.         i$ = INPUT$(1)
  157.         SELECT CASE i$
  158.             CASE "1" TO "7": ShowOptions = VAL(i$): EXIT FUNCTION
  159.         END SELECT
  160.     LOOP
  161.  

Now, since we have variable length strings, we can't just always replace the old data with the new data.  What if what we're writing is longer than what we wrote before? 

For example, what if we had a phone number of 555-1234, and we wanted to edit it to include the area code 555-555-1234.  We certainly couldn't just toss those 12 bytes into the same space which only contained 8 bytes of information before...   So what do we do??

Simplest solution is to just work it like a new record and put the information at the end of the existing database, and point our index to it there, and that's what I'm doing in the code here.

Code: QB64: [Select]
  1.     IF filesize > oldfilesize THEN 'if our edit is larger than our old data
  2.         Index.StartPosition = LOF(1) + 1 'we have to put it at the end of the existing datafile
  3.     END IF 'otherwise,we just put it where it currently exists


Now, if you take a moment to think about it, this will obviously create a database with unreferenced data.

For example, let's say I had the following for data: AppleBananaCarrot

Now, if I want to replace that Banana with Beet, I can just swap it out where it exists: AppleBeetnaCarrot

Now, even in this scenario, we now have 2 bytes in that data that we're no longer using -- the last "na" in Banana, which Beet just left behind without touching...

But, what if we wanted to replace that "Banana" with "Bell Pepper"?  We can't just easily put it where the Banana was, so in this case we just tack it to the end of the original data and point the 2nd index to it there: AppleBananaCarrotBell Pepper

Originally, our indexes were: 1, 6, 12 (Where the A, B, C started for each data entry)
Now, our indexes are 1, 17, 12 (Which is where the A, B in the Bell Pepper, and C all start at in AppleBananaCarrotBell Pepper)

Since we tacked Bell Pepper to the end of our data, we now have a segment in there which is just stray, unreferenced information:  AppleBananaCarrotBell Pepper

Nothing points to it.  Nothing uses it.  It's just extra, left over JUNK in our database....

...Which is why we'd want to include one more option in our main menu, for the user to PURGE that junk and compact that database down from AppleBananaCarrotBell Pepper to AppleCarrotBell Pepper.  But, I'll save that little bit of coding demo for later, so folks can focus on the edit process and tracking the positioning of data in our databases for now.

Any questions?  Comments?  Insights?  Feel free to share them below, and I'll answer them as time and life allows.  ;D
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline Dimster

  • Forum Resident
  • Posts: 500
    • View Profile
Re: Variable length string database, using an index file
« Reply #5 on: October 06, 2020, 09:41:55 am »
Steve - would you be able to run that edit by me one more time? If I had a field carrying data "AppleBananaCarrot" and wanted to change this data to "AppleBeetCarrot" - assuming I'm typing the complete new data for that field, why would the "na" re-appear in the field's data?

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
Re: Variable length string database, using an index file
« Reply #6 on: October 06, 2020, 10:25:23 am »
Steve - would you be able to run that edit by me one more time? If I had a field carrying data "AppleBananaCarrot" and wanted to change this data to "AppleBeetCarrot" - assuming I'm typing the complete new data for that field, why would the "na" re-appear in the field's data?

Certainly.  :)

Now, our original data looked like AppleBananaCarrot.

Our original index would look like
Record 1: Start 1, Length 5
Record 2: Start 6, Length 6
Record 3: Start 12, Length 6

With this information, we can pick any record which we want, and retrieve the variable length data that it contains.   For the 2nd record, we start at byte number 6, and retrieve 6 bytes of data:  "Banana".   Without these two pieces of information, how do we know how long our data is?  How do we know this isn't one piece of information on an AppleBananaCarrot smoothie?  Our index tracks both the start, and the length of our data fields for us.

Now, if we end up editing and replacing "Banana" with "Beet", we want to make the changes to the database as simple as possible.  Since "Beet" is less than, or equal to, the length of "Banana", we can just swap its data directly in place of the old, and then update our index.

Our Edited Data now looks like:  AppleBeetnaCarrot

And our edited Index now looks like:
Record 1: Start 1, Length 5
Record 2: Start 6, Length 4
Record 3: Start 12, Length 6
Record 4+:  All remains the same

With the BOLD ITALICS above, you can see that we've made the minimal amount of changes possible to preserve our data, and keep all the information referenced properly.

Now, you *could* do as you suggest, and pack the database as you go along, but that's actually going to be a lot of processing to do

AppleBeetCarrot  <-- Let's say you want to make this your data base, so that "na" isn't in there any longer.

You'd have to recalculate the index completely, just to make that one single change:
Record 1: Start 1, Length 5
Record 2: Start 6, Length 4
Record 3: Start 10, Length 6
Record 4+: All need their Start references updated, Length will remain the same

Notice with the italics here, how much you're changing, and reindexing?  Now, for a database of 3 records, there's no worry over it.  You'll be finished in microseconds.  But what if we suddenly are working with a database with 1,000,000,000 records??  Do you *really* want to rebuild that index and pack that database with each and every change to the dataset?  Or do you think it might be better to just leave some junk in it (like that lost "na" from banana), and then schedule a purge/packing of the data for 2AM on Sunday, when the company is closed and nobody is using the database?



It's not that the "na" is re-appearing in that field's data.  It's simply that you can put, at most, 6-bytes of information in the same place where "banana" was stored in the dataset.  Since you're only putting 4-bytes there, there's no reason to mess with the other 2 at all, at this point in time.  (You're changing "Bana" for "Beet".)  Your new index won't reference those other 2-bytes (the "na"), and when you rebuild/pack the database in the future, you'll get rid of that unreferenced data, and your database *will* end up looking like you suggest "AppleBeetCarrot".  You're simply ignoring them completely, for the moment, and just making the minimal changes to your index and data, to store your new information, without having to alter more than a single record at the time.  Packing and cleaning the database can be done at a later time, to save a little disk space on your machines.  ;)




The whole point is to make your edits, and store your data, while only affecting a single index/record at a time.  If you were using a RANDOM access file, you certainly don't want to have to redo the whole file just because you changed a single record, and it's the same here with a variable-length data field.  You don't want to have to move and change the whole dang database, just to update a single record -- you only want to make changes which impact on that one record, and that record only.

As long as your edit is less than, or equal to the old data, you can just update the data and the length index, and be done with it.
If it's longer than the old data, you then simply tack it to the end of your existing data, and point the start and length index to that new position, and be done with it.

Both options keep you only making changes to a single record/index, but both will also leave stray, unreferenced data behind.  Going this route, you'll want to manually clean up the database at some later point in time, which won't affect user productivity any. 

(And all you do then is basically just read your records, and write them to a new database/index, in order -- then delete or archive the old set, and rename the new set to take its place in your program.  Personally, I prefer to archive the old datasets, just in case the drive ever gets corrupted, or something unforeseen happens like your intern deletes the whole thing on you, while trying to show off...)
« Last Edit: October 06, 2020, 10:35:35 am by SMcNeill »
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline Dimster

  • Forum Resident
  • Posts: 500
    • View Profile
Re: Variable length string database, using an index file
« Reply #7 on: October 06, 2020, 10:59:35 am »
Thanks Steve