3
« on: September 19, 2021, 09:00:57 pm »
The indexes contain key values like an account number. When you save the record, the account number is inserted into the index tree in the proper order. So if the account number is 100 it would be inserted into the tree right after 99. Normally, the records are contained in a block, say 100 records per block. The location of the record in the block is recorded with the account number. So, when you need to load account 100 again, the program looks through the index and finds 100, and then retrieves the location of the record within the block. It would then seek to that location and load the record.
Typically records have a header and the start of the header is actually what is stored in the index. The header is a fixed length, so the program knows how much data to read. After reading the header, the file pointer would be sitting at the beginning of the record. The program would read the record header to find out how long the record is and then load the record based on the length in the header. That way you can have variable length records which will save space and makes it faster to retrieve data.
This is a very simplified explanation and this isn't something that you would code yourself. You could use SQL Lite for instance, and simply query the database for account number 100. Using a database, you can get 1 record or 100 with a simple query, which makes it much easier to manipulate this kind of data. You can do something like "Select * from customers where account_number = 100". If the record is there, it will return it to the program where you can access the data within the record. The star means to grab all the fields in the record. Customers would be a table that contains the data. You can also do things like join multiple tables and return the information as if it were one table. It is quite handy when you get the hang of it and very powerful.
It was quite straightforward to load a database with preexisting data too as there are built-in commands to do such a thing. It is actually quite common in many businesses. When I worked for Ford Motor Finance, we would get a daily dump from the mainframe that we would load into the Oracle database every day. Once you set it up, it is an automatic process. I am not suggesting you use a database for your data, as I don't really know what you are doing with it. Managing datasets has always been a problem, which is why almost everyone uses a database if they are working with a lot of data. It is just easier.