Author Topic: Random Access to a Sequential File part 2 (Read 6903 times)

Dimster · « **on:** June 01, 2021, 04:59:09 pm »

I have been messing with "SEEK". Once I open a sequential file and use SEEK #1, Seek(1), it positions the pointer at the first byte in the first record. The file being search only contains integers but the actual program writing to the sequential file is using a variable as SINGLE. There are 501 entries or records and moving Forward from record 1 to the last record is a simple matter of Seek #1,Seek(1)-4 +4. I believe this means each record is 4 bytes apart.

The conundrum I have is in going backward through the records. I can access the last record as Seek #1,Seek(1) - Len(1) however from there it would appear it is not a matter of subtracting 4 bytes but 6 bytes. So SEEK #1, Seek(1) -6 will step back through the records until suddenly it doesn't. At that point it become a -5 which will get you further to the beginning of the sequental file and when - 5 stops working it eventually becomes -4.

Anyone know why I can go forward through to the end of a sequential file using Seek #1,Seek(1)-4+4 but starting at the end of the file to go back to the beginning there is a -6, -5 and -4 needed to find the record.

Also, if this backward behavior is when there are 501 records, would there be an upper end to that negative value if I have 1002 records?

Thanks for taking a look at this.

SMcNeill · « **Reply #1 on:** June 02, 2021, 08:04:30 am »

I’d say it’s because you’re working with a file created by windows, which has 2 EOL (end of line) characters — CHR$(10) and CHR$(13).

PRINT #1, “FOO” writes 5 bytes to the disk in windows — FOO + CHR$(13) + CHR$(10). Linux and Mac only write 4 bytes to disk — FOO + CHR$(10)

NOVARSEG · « **Reply #2 on:** June 02, 2021, 08:48:12 pm »

with files open as BINARY every PUT or GET will advance the "file pointer"

You only need to use SEEK to get an initial position in the file.

For files open as RANDOM I'm not sure if that rule holds.

Quote

Once I open a sequential file and use SEEK #1, Seek(1), it positions the pointer at the first byte in the first record.

When a file is first opened the file pointer automatically points to the first byte in the file which is a 1 BASIC (but 0 in assembler)

I think SEEK #1, Seek(1), is the same as SEEK #1, 1, (when the file is first opened)

where the , 1, is the record number. Seek(1) returns the current record number

Quote

There are 501 entries or records and moving Forward from record 1 to the last record is a simple matter of Seek #1,Seek(1)-4 +4. I believe this means each record is 4 bytes apart.

Seek #1,Seek(1)-4 +4. I'm not sure this does anything?

Seek(1) finds the current record number, but you are adding -4 + 4 = 0

You don't have to use Seek #1 to increment the record number because GET or PUT should advance the record number? I will have to try some code to make sure.

However, to decrement the record number try

Seek #1,Seek(1)-1

If it turns out that GET or PUT has no effect on the record number then

increment record number
Seek #1,Seek(1)+1

decrement record number
Seek #1,Seek(1)-1

NOVARSEG · « **Reply #3 on:** June 03, 2021, 03:41:04 am »

tested some code . GET and PUT auto advance the record number. To go backwards (decrement the record number) use

SEEK #1, SEEK(1) - 1

Quote

DIM A AS STRING
DIM B AS STRING
DIM C AS STRING

A = "abc"
B = "def"

OPEN "test.txt" FOR RANDOM AS #1
PUT #1, , A

PUT #1, , B

SEEK #1, SEEK(1) - 1
GET #1, , C
PRINT C

SEEK #1, SEEK(1) - 1
GET #1, , C
PRINT C

CLOSE

WHOOPS

Quote

DIM A AS STRING
DIM B AS STRING
DIM C AS STRING
DIM D AS STRING

A = "abc"
B = "def"
C = "ghi"

OPEN "test.txt" FOR RANDOM AS #1
PUT #1, , A
PUT #1, , B
PUT #1, , C

SEEK #1, SEEK(1) - 1 'decrements record number (initial )
GET #1, , D 'increments record number
PRINT D

SEEK #1, SEEK(1) - 2 ' so -2 counteracts the previous GET
GET #1, , D 'increments record number
PRINT D

SEEK #1, SEEK(1) - 2 ' so -2 counteracts the previous GET
GET #1, , D 'increments record number
PRINT D
CLOSE

test.txt is 261 bytes!

Dimster · « **Reply #4 on:** June 03, 2021, 10:23:25 am »

Thanks Novarseg. I am going to play with your suggestions. It hadn't occurred to me that Seek(??) may not be needed, just a comma and a value.

I'm not using Random or Binary in this adventure, sticking strictly with Sequential. Consequently using "Input #" rather than Get.

Moving forward Sequentially, after an INPUT, it seemed I needed a -4 to move the pointer backward to position it at the byte that began the INPUT, thus the -4 in the seek for the next record. But the next record (again going forward in a sequential file) begins 4 bytes away from the record just INPUT, thus the +4. So SEEK(1)-4+4 appeared to work more often than naught. I haven't tried a Seek(1)+0 but I did try a simple Seek(1) to move to the next INPUT and ran into some inputted zero values. Perhaps just the comma and number of bytes to move forward, as you suggested, would work better.

Going backward appears to be quite different. Steve pointed some instances where the sequential record could carry more than 4 bytes which I though all sequential records, using integers, where strictly bound to that value. After Steve pointed this out I began from the Length of File value going backward byte by byte. Where going forward using 4 bytes per record, seemed to get me to the end of the file successfully, going backward it turns out the size of value being stored was needed. So, from the last byte in a sequential file, if the value stored is 10 to 99, or 100 to 999, or 1000 to 9999, then the bytes change. For example, if the last value stored is Record # 501 with a value of 2345, it takes a -6 to move to the beginning of that last record, then a further -6 to get to the 2nd last record.

If the say Record 385 contains the value of 625, then after inputting that value, to get back to the beginning of Record 385 is -5, then a further -5 to get to the beginning of Record 384. Values of 99 or less are back to the -4.

As the famous Rosanna Rosannadanna would says "Anyways, it's always something"

NOVARSEG · « **Reply #5 on:** June 03, 2021, 11:02:58 pm »

@Dimster

been reading the QB64 wiki on files open as RANDOM and it looks like the concept of a "record" is only with files open as RANDOM.

I would try RANDOM instead of sequential. RANDOM mode allows variable length records and there is no guess work to find a record, simply enter the record number etc.

Dimster · « **Reply #6 on:** June 04, 2021, 10:27:09 am »

@NOVARSEG - you are of course correct, why would anyone want Random access to a Sequentially written file. Well, long story short...I have tons of data in tons of sequential files. Retrieving the info from those file is faster via sequential v's random ...IF I can avoid all those OPEN/CLOSE commands. It seemed to me that if I used an OPEN command and Open all the files at once that I wanted to work with on that particular day, and then be able to go forward and backward in each opened file to search for data, it would be a random search in a sequential file. Where Random stores records all over the place with an index to locate each record, a sequential file just needs that ability to know where the pointer needs to be to access any piece of data.

I'm not sure of the difference in storage space between a Randomly created file and a Sequential file but I think the sequential takes less room. At the moment storage isn't a real issue but down the road each of my sequential files are growing by 50 to 100 additional data items per year.

The thing is...I don't know if there is a speed improvement in retrieving data via a Random Access to a Sequential file, rather than just using Random as you suggest. At the moment I have 50 to 60 sequential files which my AI program is expected to address.

NOVARSEG · « **Reply #7 on:** June 05, 2021, 12:17:06 am »

Try putting all the data files you have into one file. A file opened as RANDOM should be fast enough. A file opened as BINARY is a bit faster but requires far more coding to get it to work properly.

For an AI application a single file opened as RANDOM is all you need.

OPEN "DATA.txt" FOR RANDOM AS #1 LEN = 256

where the max length of each record is 256 bytes

There is another way.
data name, data

record$ = "data name1" + CHR$(1 + "data1" + CHR$(2)
record$ = record$ + "data name2" + CHR$(1) + "data2" + CHR$(2)
record$ = record$ + "data name3" + CHR$(1) + "data3" + CHR$(2)

and so on

In that way you can save variable length strings or numbers(converted to strings) in any order as long as the total record length is not exceeded. That means that if a record changes it's format it does not matter what the format is because there are the CHR$(1) CHR$(2) so that code can parse the data fields to provide the required output.

The RECORD is just a place to store data. It is up to the processing code to make sense of what's in RECORD$

PUT #1, , record$

SEEK #1, SEEK(1) - 1

GET #1, , record$

PRINT record$

Process what is in RECORD$ to extract the original data etc.

and 256 bytes for a record is tiny. All the data you have should fit into a single 1MB file.

NOVARSEG · « **Reply #8 on:** June 05, 2021, 01:59:00 am »

code example of the RANDOM / BINARY mode

The beauty of this method is that data can be saved in any format and processed in any way that suits the application. Also the RANDOM file mode makes it easy to access the file as records (in this case 256 byte records).

Code: QB64: [Select]

DIM P AS INTEGER
DIM P1 AS INTEGER
OPEN "test.txt" FOR RANDOM AS #1 LEN = 256
 
record$ = "race 1" + CHR$(1) + "23 seconds" + CHR$(2)
record$ = record$ + "race 2" + CHR$(1) + "67 seconds" + CHR$(2)
record$ = record$ + "race 3" + CHR$(1) + "25 seconds" + CHR$(2)
record$ = record$ + "car type" + CHR$(1) + "Ford" + CHR$(2)
 
PUT #1, , record$
 
SEEK #1, SEEK(1) - 1
 
GET #1, , record$
 
P = 0
 
DO
    P = INSTR(P + 1, record$, CHR$(2))
    IF P = 0 THEN EXIT DO
    D$ = MID$(record$, P1 + 1, P - P1 - 1)
    P1 = P
    n = INSTR(1, D$, CHR$(1))
    L$ = LEFT$(D$, n - 1)
    R$ = RIGHT$(D$, LEN(D$) - n)
    PRINT L$ + " = " + R$
LOOP
CLOSE
 
END

Dimster · « **Reply #9 on:** June 06, 2021, 08:07:27 am »

@NOVARSEG . Wow...all the data files into 1 large one. There about 50 sequential files, each with, on average, 500 data items = 25,000 and if that total is growing by say 100 additional pieces of data per year, that may take a while to find things. But I understand the concept...a Random access would be the better route for accessing individual items of data if all the data is in one large file than sequential access in either one large file or multiple files, plus the added advantage of avoiding all the Open/Close commands to switch between multiple files. Lots to think about here, thanks for the INPUT or should that be thanks for the GET.

NOVARSEG · « **Reply #10 on:** June 06, 2021, 05:26:25 pm »

i was playing with RANDOM files and the usual TYPE data format.

GET or PUT won't work if the items in a TYPE are variable length like

TYPE info
race1 AS STRING
race2 AS STRING
END TYPE

GET or PUT wants

TYPE info
race1 as string * whatever
race2 as string * whatever
end type

All TYPE does is make a very exacting data format. That is OK but what if the format changes later?

The random /binary mode requires a bit more coding but is far superior when data formats need to be changed. And TYPE is not needed.

example

Code: QB64: [Select]

TYPE info
    race1 AS STRING
    race2 AS STRING
    carType AS STRING
END TYPE
 
 
DIM REC AS info
 
OPEN "test.txt" FOR RANDOM AS #1 LEN = 256
 
 put 1 , , rec
 

returns an error
"UDT must have fixed size"

which is to be expected. If there was no error, then data items would get overwritten quite easily.

I'm not sure of what the 50 files contain but lets say each file has data that the other files don't In theory each file could be a record.

OPEN "test.txt" FOR RANDOM AS #1 LEN = 5000

I'm guessing your file sizes are in the range of 5000 bytes. The max record size is

32767 bytes

all the data could be in one large file with each record simulating what used to be in a file.

Dimster · « **Reply #11 on:** June 08, 2021, 03:31:30 pm »

I have a semblance of Random access to a Sequential file and I'm embarrassed as to how complicated I thought the solution was. All that baloney about calculating the bites going forward and backward. I think the code I'm using will be fine for what I want it to do and a lot faster that Opening and Closing each file during a search. I have to say though, if Norarseg hadn't given me the SEEK #1,1 then I think I would still be at it. So thanks Noraseq .

News:

Author Topic: Random Access to a Sequential File part 2 (Read 6903 times)

Dimster

Random Access to a Sequential File part 2

SMcNeill

Re: Random Access to a Sequential File part 2

NOVARSEG

Re: Random Access to a Sequential File part 2

NOVARSEG

Re: Random Access to a Sequential File part 2

Dimster

Re: Random Access to a Sequential File part 2

NOVARSEG

Re: Random Access to a Sequential File part 2

Dimster

Re: Random Access to a Sequential File part 2

NOVARSEG

Re: Random Access to a Sequential File part 2

NOVARSEG

Re: Random Access to a Sequential File part 2

Dimster

Re: Random Access to a Sequential File part 2

NOVARSEG

Re: Random Access to a Sequential File part 2

Dimster

Re: Random Access to a Sequential File part 2