Author Topic: File To Array  (Read 2661 times)

0 Members and 1 Guest are viewing this topic.

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
File To Array
« on: January 27, 2021, 10:43:15 am »
Constantly, folks are asking for the quickest way to load a large file into an array for their needs.  Usually, we start off simple by just telling them, "Use FOR BINARY, rather than FOR INPUT in your code.  It saves a ton of time!" 

Yet, there's always somebody who then comes along and says, "Is there a way faster than that?"

And the answer has always been: "Load it all at once and parse it."

BUT NOT ANYMORE!!  I NOW BRING YOU THE LATEST, THE GREATEST, THE BESTEST NEW.....

FileToArray!!

Code: QB64: [Select]
  1. REDIM lines(0) AS STRING
  2. DEFLNG A-Z
  3.  
  4. 'make a quick file for testing purposes
  5. PRINT "Creating data file.  Please wait a moment....."
  6. OPEN "temp.txt" FOR OUTPUT AS #1
  7. FOR i = 1 TO 1000000
  8.     PRINT #1, "This is line #"; i; ". And here's my number: "; INT(RND * 10000000)
  9. PRINT "Data file created.  Now doing time tests...."
  10.  
  11.  
  12. 'Start the timer
  13. t## = TIMER
  14. FileToArray "temp.txt", lines()
  15. t1## = TIMER
  16. count = UBOUND(lines) - LBOUND(lines)
  17. PRINT USING "Loaded and parsed #,###,### lines, in ###.#### seconds."; count, t1## - t##
  18.  
  19. FOR i = LBOUND(lines) TO LBOUND(lines) + 9
  20.     PRINT i; lines(i)
  21.  
  22.  
  23. 'and here's an example using an array which starts at index 1, instead of 0
  24.  
  25. REDIM lines(1 TO 1) AS STRING
  26. t## = TIMER
  27. FileToArray "temp.txt", lines()
  28. t1## = TIMER
  29. count = UBOUND(lines) - LBOUND(lines)
  30. PRINT USING "Loaded and parsed #,###,### lines, in ###.#### seconds."; count, t1## - t##
  31.  
  32. FOR i = LBOUND(lines) TO LBOUND(lines) + 9
  33.     PRINT i; lines(i)
  34.  
  35.  
  36.  
  37. PRINT "And to compare times..."
  38. count = 1000000
  39. t## = TIMER(0.001)
  40. REDIM lines(1000000) AS STRING 'I'm going to cheat here and correctly size the lines array from the start
  41. '               to shave off a little time constantly resizing it, and to keep my test code down to just
  42. '               the time it takes for the program to load.
  43. OPEN "temp.txt" FOR INPUT AS #1
  44. FOR i = 1 TO 1000000
  45.     LINE INPUT #1, lines(i)
  46. t1## = TIMER(0.001)
  47. PRINT USING "Using FOR INPUT, I loaded and parsed #,###,### lines, in ###.#### seconds."; count, t1## - t##
  48.  
  49. t## = TIMER(0.001)
  50. REDIM lines(1000000) AS STRING 'I'm going to cheat here and correctly size the lines array from the start
  51. 'to shave off a little time constantly resizing it, and to keep my test code down to just
  52. 'the time it takes for the program to load.
  53. OPEN "temp.txt" FOR BINARY AS #1
  54. FOR i = 1 TO 1000000
  55.     LINE INPUT #1, lines(i)
  56. t1## = TIMER(0.001)
  57. PRINT USING "Using FOR BINARY, I loaded and parsed #,###,### lines, in ###.#### seconds."; count, t1## - t##
  58.  
  59. t## = TIMER(0.001)
  60. FileToArray "temp.txt", lines()
  61. t1## = TIMER(0.001)
  62. PRINT USING "Using FileToArray, I loaded and parsed #,###,### lines, in ###.#### seconds."; count, t1## - t##
  63.  
  64.  
  65. SUB FileToArray (file$, FileToArray_lines() AS STRING)
  66.     DIM FileToArray_CRLF AS STRING
  67.     FileToArray_Handle = FREEFILE
  68.     OPEN file$ FOR BINARY AS #FileToArray_Handle
  69.     temp$ = SPACE$(LOF(FileToArray_Handle))
  70.     GET #FileToArray_Handle, 1, temp$
  71.     CLOSE #FileToArray_Handle
  72.     'find CRLF
  73.     IF INSTR(temp$, CHR$(13)) THEN FileToArray_CRLF = CHR$(13)
  74.     IF INSTR(temp$, CHR$(10)) THEN FileToArray_CRLF = CHR$(10)
  75.     IF INSTR(temp$, CHR$(13) + CHR$(10)) THEN FileToArray_CRLF = CHR$(13) + CHR$(10)
  76.     IF INSTR(temp$, CHR$(10) + CHR$(13)) THEN FileToArray_CRLF = CHR$(10) + CHR$(13)
  77.     'if there's no line endings, then simply just send what we have as a single line.
  78.     count = LBOUND(FileToArray_lines)
  79.     IF FileToArray_CRLF = "" THEN REDIM FileToArray_lines(count TO count): FileToArray_lines(count) = temp$: EXIT SUB
  80.     'parse into an array
  81.  
  82.     U = 1000000 + count
  83.     REDIM FileToArray_lines(count TO U) AS STRING
  84.  
  85.     DO
  86.         L = INSTR(l1, temp$, FileToArray_CRLF)
  87.         IF L THEN
  88.             IF count > U THEN U = U + 1000000: REDIM _PRESERVE FileToArray_lines(LBOUND(FileToArray_lines) TO U) AS STRING
  89.             FileToArray_lines(count) = MID$(temp$, l1, L - l1)
  90.             l1 = L + LEN(FileToArray_CRLF)
  91.             count = count + 1
  92.         ELSE
  93.             FileToArray_lines(count) = MID$(temp$, l1)
  94.         END IF
  95.     LOOP UNTIL L = 0
  96.     REDIM _PRESERVE FileToArray_lines(LBOUND(FileToArray_lines) TO count) AS STRING

IT WORKS WITH ARRAYS DIMENSIONED FROM 0!  IT WORKS WITH ARRAYS DIMENSIONED FROM 1!  IT WORKS WITH ARRAYS WITH ANY SORT OF LBOUND THAT YOU CAN IMAGINE, AND THEN IT RESIZES ITSELF FROM THERE!!

Oh my Gawd!!

AND IT'S FAST!  IT'S BLAZING FAST!  AS THE SCREENSHOT BELOW SHOWS!!!

 
SS.png


AND FOR A LIMITED TIME ONLY, YOU CAN GET IT FOR THE LOW, LOW PRICE OF NOTHING, OR FOR THREE EASY PAYMENTS OF NOTHING!!

TAKE ADVANTAGE OF THIS OFFER AND DON'T LET IT PASS YOU BY NOW!!

*Post in honor of Billy Mays Hayes -- one of the greatest salesmen ever to grace the realm of late night infomercials.



But, in all seriousness, guys, this is about as simple as it gets.   REDIM an array as a string to hold your data, then simply call the sub FileToArray with the name of the file, and the name of that array, and let it load and parse it for you.

If anyone finds any little glitches, or cases where it fails to work, feel free to report them and I'll update the routine as I make corrections.

AND YES, PETE, IT DOES WORK IN SCREEN 0!
« Last Edit: January 27, 2021, 10:49:27 am by SMcNeill »
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline Pete

  • Forum Resident
  • Posts: 2361
  • Cuz I sez so, varmint!
    • View Profile
Re: File To Array
« Reply #1 on: January 27, 2021, 11:18:17 am »
Sorry Billy Steve, I've already used that method, and for quite some time, on various text projects. Oh well, I guess if you haven't seen it, it's new to you. (That's from Darrin Stevens, some TV ad guy who made that line famous when those Hollywood %^%$#s started shoving re-runs down our throats.) Now that's actually the fasted way to PARSE PETE OFF.

Pete

PS I would not have come up with the control character parsing method, if you hadn't come up with the load the whole  file at once method, many moons ago. I did thank you for that, a few times, but since it's part of this topic, I'm happy to say thanks again! That was a real gem, free too, and lucky for me, I waited to get it on the have one get one free sale.
Want to learn how to write code on cave walls? https://www.tapatalk.com/groups/qbasic/qbasic-f1/

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3972
    • View Profile
    • Steve’s QB64 Archive Forum
Re: File To Array
« Reply #2 on: January 27, 2021, 12:17:56 pm »
Aye; nothing new here.  This is the same way I’ve been loading and parsing strings since the caveman ages of LET.  The only real difference is I’ve finally put the routine in a simple SUB to add to an $INCLUDE library.

Why I’ve never did that before, is beyond me... Honestly, I guess I’m just so used to typing it out, that I simply never considered turning it into a SUB before.  Now that I have though, I’m certain it’ll end up becoming part of my goto toolset.  ;)
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline bplus

  • Global Moderator
  • Forum Resident
  • Posts: 8053
  • b = b + ...
    • View Profile
Re: File To Array
« Reply #3 on: January 27, 2021, 01:27:57 pm »
Yeah sorta been doing this already, already have the sweet little wonderful parser Split1000!
https://www.qb64.org/forum/index.php?topic=1607.0

 but this:
Code: QB64: [Select]
  1.     'find CRLF
  2.     IF INSTR(temp$, CHR$(13)) THEN FileToArray_CRLF = CHR$(13)
  3.     IF INSTR(temp$, CHR$(10)) THEN FileToArray_CRLF = CHR$(10)
  4.     IF INSTR(temp$, CHR$(13) + CHR$(10)) THEN FileToArray_CRLF = CHR$(13) + CHR$(10)
  5.     IF INSTR(temp$, CHR$(10) + CHR$(13)) THEN FileToArray_CRLF = CHR$(10) + CHR$(13)
  6.  

would have prevented me from loading files into arrays the slow way with FOR INPUT.

And if we ARE opening to read a txt or bas file or determine if readable and not binary, that would be useful to have test for too.

Update: Oh a clue might be having a line over a certain length.
« Last Edit: January 27, 2021, 02:00:09 pm by bplus »