Author Topic: Is there someway to speed up reading a text file  (Read 5559 times)

0 Members and 1 Guest are viewing this topic.

Offline MLambert

  • Forum Regular
  • Posts: 115
    • View Profile
Is there someway to speed up reading a text file
« on: February 21, 2020, 04:18:14 am »
Hi,

Is there someway to increase the speed of an input #1,A$,B$ etc ... ??

Maybe increase the read buffer size ?

Thks,

Mike

Offline TerryRitchie

  • Seasoned Forum Regular
  • Posts: 495
  • Semper Fidelis
    • View Profile
Re: Is there someway to speed up reading a text file
« Reply #1 on: February 21, 2020, 05:36:45 am »
In order to understand recursion, one must first understand recursion.

FellippeHeitor

  • Guest
Re: Is there someway to speed up reading a text file
« Reply #2 on: February 21, 2020, 06:56:17 am »
Hmm, no. Last I checked, binary mode would speed up LINE INPUT reads, but still not allow INPUT reads, as those remained exclusive to INPUT mode. I could be wrong.

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
Re: Is there someway to speed up reading a text file
« Reply #3 on: February 21, 2020, 07:31:30 am »
Fastest way is always to just read the whole life at once and then parse it.

OPEN “yourfile.txt” FOR BINARY AS #1
text$ = SPACE$(LOF(1))
GET #1, , text$
CLOSE

‘Then parse text$ using appropriate CRLF and comma separaters.
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline Pete

  • Forum Resident
  • Posts: 2361
  • Cuz I sez so, varmint!
    • View Profile
Re: Is there someway to speed up reading a text file
« Reply #4 on: February 21, 2020, 10:28:25 am »
The thread Terry posted has some examples and explanation. What Steve posted is what I use to load the entire contents of a file all at once. I think he came up with that one a couple of years back. It's great for loading html pages. Anyway, if you do load the entire contents, be aware, as Steve pointed out, of the control line characters. Specifically all stored text lines terminate in CHR$(13) + CHR$(10). So if I were loading an entire text file to my word processor app, I'd might want to parse out those characters. Something like...

DO until instr(a$, CHR$(13) + CHR$(10)) = 0
a$ = mid$(a$, 1, instr(CHR$(13) + CHR$(10)) - 1) +  mid$(a$, instr(CHR$(13) + CHR$(10)) + 2)
LOOP

Now my a$ variable is free of those control characters.

However, if you want to use those characters to read lines, it would go something like this...

Code: QB64: [Select]
  1. ' You will need to make and name a text file "tmp.tmp" in your local QB64 directory to run this example.
  2. IF _FILEEXISTS("tmp.tmp") THEN ELSE PRINT "File not found.": END
  3. OPEN "tmp.tmp" FOR BINARY AS #1
  4. x$ = SPACE$(LOF(1))
  5. GET #1, 1, x$
  6.  
  7. DO UNTIL INSTR(x$, CHR$(13)) = 0
  8.     a$ = MID$(x$, 1, INSTR(x$, CHR$(13) + CHR$(10)) - 1)
  9.     x$ = MID$(x$, LEN(a$) + 3)
  10.     PRINT a$
  11.  

Parse out,

Pete

Want to learn how to write code on cave walls? https://www.tapatalk.com/groups/qbasic/qbasic-f1/

Offline TerryRitchie

  • Seasoned Forum Regular
  • Posts: 495
  • Semper Fidelis
    • View Profile
Re: Is there someway to speed up reading a text file
« Reply #5 on: February 21, 2020, 01:42:29 pm »
My thinking was the poster could read the entire file in then parse it out as needed.
In order to understand recursion, one must first understand recursion.

Offline Pete

  • Forum Resident
  • Posts: 2361
  • Cuz I sez so, varmint!
    • View Profile
Re: Is there someway to speed up reading a text file
« Reply #6 on: February 21, 2020, 02:16:50 pm »
Ah Terry, that's what Steve are talking about. Am I missing something here?

Anyway, QB64 BINARY LINE INPUT is so fast, I really cannot see any appreciable time difference between using it or loading the entire file and then parsing it out, unless you want something towards the end of a very large file, sure, that's faster.

Pete
Want to learn how to write code on cave walls? https://www.tapatalk.com/groups/qbasic/qbasic-f1/

Offline bplus

  • Global Moderator
  • Forum Resident
  • Posts: 8053
  • b = b + ...
    • View Profile
Re: Is there someway to speed up reading a text file
« Reply #7 on: February 21, 2020, 03:03:53 pm »
I would use BINARY LINE INPUT unless I had to parse other stuff too then I would use this:
Code: QB64: [Select]
  1. FUNCTION fLineCnt (txtFile$, arr() AS STRING)
  2.     DIM filecount%, b$
  3.     filecount% = 0
  4.     IF _FILEEXISTS(txtFile$) THEN
  5.         OPEN txtFile$ FOR BINARY AS #1
  6.         b$ = SPACE$(LOF(1))
  7.         GET #1, , b$
  8.         CLOSE #1
  9.         REDIM _PRESERVE arr(1 TO 1) AS STRING
  10.         Split b$, CHR$(13) + CHR$(10), arr()
  11.         filecount% = UBOUND(arr)
  12.     END IF
  13.     fLineCnt = filecount% 'this file returns the number of lines loaded, 0 means file did not exist
  14.  
  15. 'notes: REDIM the array(0) to be loaded before calling Split '<<<< IMPORTANT dynamic array and empty, can use any lbound though
  16. 'This SUB will take a given N delimited string, and delimiter$ and create an array of N+1 strings using the LBOUND of the given dynamic array to load.
  17. 'notes: the loadMeArray() needs to be dynamic string array and will not change the LBOUND of the array it is given.  rev 2019-08-27
  18. SUB Split (SplitMeString AS STRING, delim AS STRING, loadMeArray() AS STRING)
  19.     DIM curpos AS LONG, arrpos AS LONG, LD AS LONG, dpos AS LONG 'fix use the Lbound the array already has
  20.     curpos = 1: arrpos = LBOUND(loadMeArray): LD = LEN(delim)
  21.     dpos = INSTR(curpos, SplitMeString, delim)
  22.     DO UNTIL dpos = 0
  23.         loadMeArray(arrpos) = MID$(SplitMeString, curpos, dpos - curpos)
  24.         arrpos = arrpos + 1
  25.         IF arrpos > UBOUND(loadMeArray) THEN REDIM _PRESERVE loadMeArray(LBOUND(loadMeArray) TO UBOUND(loadMeArray) + 1000) AS STRING
  26.         curpos = dpos + LD
  27.         dpos = INSTR(curpos, SplitMeString, delim)
  28.     LOOP
  29.     loadMeArray(arrpos) = MID$(SplitMeString, curpos)
  30.     REDIM _PRESERVE loadMeArray(LBOUND(loadMeArray) TO arrpos) AS STRING 'get the ubound correct
  31.  

Offline MLambert

  • Forum Regular
  • Posts: 115
    • View Profile
Re: Is there someway to speed up reading a text file
« Reply #8 on: February 25, 2020, 05:50:06 am »
Thks everyone for the input.

Loading the file into memory is impracticable as there are millions of transactions.

Now reading the file as binary is interesting but I would then have to break down each record into 400+ fields and I know that this is memory work but
I don't know if I would gain anytime here.

I thought that maybe there was a way to increase the input buffer size of the input file.

Mike

Offline bplus

  • Global Moderator
  • Forum Resident
  • Posts: 8053
  • b = b + ...
    • View Profile
Re: Is there someway to speed up reading a text file
« Reply #9 on: February 25, 2020, 02:07:03 pm »
Are records fixed length?

Offline Pete

  • Forum Resident
  • Posts: 2361
  • Cuz I sez so, varmint!
    • View Profile
Re: Is there someway to speed up reading a text file
« Reply #10 on: February 25, 2020, 03:16:24 pm »
Jumping ahead to what BPlus, I think, is thinking... Why not remake this file into a RANDOM ACCESS file? At least you can index those, within your program. Going to an indexed point is a lot faster than sifting through a file record by record from the start.

Are we clear though that OPEN 'myfile" FOR BINARY AS #1 is the same as OPEN "myfile" FOR INPUT AS #1, except FOR BINARY reads the records much faster using LINE INPUT #1 than FOR INPUT does? In QBasic, we never could do LINE INPUT # with FOR BINARY. That is a special addition in QB64. It simply makes sequential file reading much, much faster than the traditional OPEN FOR INPUT QBasic reading method.

Also, you could load the file as chunks all at once. Something like a$ = space$(1000000): GET #1, 1, a$... Parse it and then... GET #1, 1000001, a$... etc.

Pete
Want to learn how to write code on cave walls? https://www.tapatalk.com/groups/qbasic/qbasic-f1/

Offline MLambert

  • Forum Regular
  • Posts: 115
    • View Profile
Re: Is there someway to speed up reading a text file
« Reply #11 on: February 25, 2020, 06:15:57 pm »
The records are variable length.

I understand about random access ... but the files need to be sorted and then processed sequentially.... batch processing with key control breaks.

I wrote my own 'database' logic with random accesses but because of updating and deleting of the data this became too hard with the volume of data to be processed so I now use mysql for that part of the processing.

In regards to binary reads .. my question to this is .. do I save processing time with all of the string manipulation I must perform to unpack the data into variable length fields ?

Reading the data in blocks would have to be in control key block lengths and these blacoks may be 2 records or 2000000 records.

It is a statistical application that needs to process vasts amount of data to produce results.  For example from 4 fields I produce 1300 different calculations.

Each record read may have 150 of these 4 field groupings.

By the way I have used C++ and QB64 beats it hands down. When I have a year or two I will try assembler.

Thanks,

Mike

Offline Pete

  • Forum Resident
  • Posts: 2361
  • Cuz I sez so, varmint!
    • View Profile
Re: Is there someway to speed up reading a text file
« Reply #12 on: February 25, 2020, 10:21:58 pm »
If I had a gun to my head, and had to decide on the spot, I'd use OPEN "myfile" FOR BINARY AS #1: LINE INPUT #1, a$ ... and check each record for what I was after. I really don't think loading in chunks and parsing them, as complex as this issue appears to be, would be any faster than using this QB64 BINARY file reading method.

Oh, looky thar at me aveetar. I has two guns to my head already!

 - Sam
Want to learn how to write code on cave walls? https://www.tapatalk.com/groups/qbasic/qbasic-f1/

Offline EricE

  • Forum Regular
  • Posts: 114
    • View Profile
Re: Is there someway to speed up reading a text file
« Reply #13 on: February 26, 2020, 01:26:08 am »
We need some quantitative data.
Here is a rough program that reads a text into memory and then searches for CR/LF pairs in order to find the lines it contains.
Then the disk file is opened and the LINE INPUT function is used to read each line it contains.

The text file used is "War and Peace" and is of size 3359548 bytes. There are 66055 lines of text contained in this file.

On my computer I got the following results.
Reading the file into memory takes so little time it cannot be measured using the TIMER function (0 seconds duration).
Reading all the lines when the file is in memory required only 0.055 seconds.
Reading all the lines when the file is on disk using the LINE INPUT function required 15.820 seconds.

Code: QB64: [Select]
  1. ' "War and Peace" test
  2. ' "http://www.gutenberg.org/files/2600/2600-0.txt"
  3.  
  4. file$ = "2600-0.txt"
  5. CRLF$ = CHR$(13) + CHR$(10)
  6.  
  7. '----
  8. starttime! = TIMER
  9. fin% = FREEFILE
  10. OPEN file$ FOR BINARY AS fin%
  11. filesize& = LOF(fin%)
  12. FileBuffer$ = SPACE$(filesize&)
  13. GET fin%, , FileBuffer$
  14. CLOSE fin%
  15. endtime! = TIMER
  16. PRINT "READING INTO MEMORY", filesize&, endtime! - starttime!
  17.  
  18. '----
  19. linecount& = 0
  20. bytecount& = 0
  21. starttime! = TIMER
  22. WHILE bytecount& < filesize&
  23.     CrlfPos& = INSTR(bytecount& + 1, FileBuffer$, CRLF$)
  24.     fileline$ = MID$(FileBuffer$, bytecount& + 1, CrlfPos& - bytecount& - 1)
  25.     ' PRINT fileline$
  26.     linecount& = linecount& + 1
  27.     bytecount& = CrlfPos& + 1
  28. endtime! = TIMER
  29.  
  30. PRINT "FILE IN MEMORY", bytecount&, linecount&, endtime! - starttime!
  31.  
  32. '----
  33. fin% = FREEFILE
  34. OPEN file$ FOR INPUT AS fin%
  35. linecount& = 0
  36. bytecount& = 0
  37. starttime! = TIMER
  38. DO UNTIL EOF(fin%)
  39.     LINE INPUT #fin%, fileline$ 'read entire text file line
  40.     linecount& = linecount& + 1
  41.     bytecount& = bytecount& + LEN(fileline$) + 2 ' include ending CR,LF characters
  42. endtime! = TIMER
  43. CLOSE fin%
  44. PRINT "FILE LINE INPUT", bytecount&, linecount&, endtime! - starttime!
  45. '----
  46.  


Offline MLambert

  • Forum Regular
  • Posts: 115
    • View Profile
Re: Is there someway to speed up reading a text file
« Reply #14 on: February 26, 2020, 03:16:37 am »
Thks again for the help.

In regards to binary reads .. no-one has answered my concerns regarding the time spent to unpack the variables and extract the data compared to the 'normal' input of input#1,A$,B$ etc...  which would help me to decide if the binary read is worth while looking at.

Also, as previously explained, my files are huge and cannot be read into memory ... say 3,000,000 records at maybe 600 chs long is a lot of memory. If I use virtual memory then I am up for page swapping etc... and again I ask the question how does this compare in processing speeds.

I appreciate the input ... but maybe someone who wrote the QB64 code can tell me if I can increase the read buffer size ?

Thsk all,

Mike