Since these are variable length fields, of various numbers of input, of extremely unusable size, here’s what I’d do:
1) Open file for binary and read what I consider a reasonable buffer. 100,000,000 bytes is a good size, if you’re dealing with GB totals...
2) Use _INSTRREV to find the last CRLF in that 100mb buffer. This is the point where you break off operations and buffer from on the next pass, so you don’t separate data.
3) Parse the data using commas and CRLF characters as necessary delimiters for the data.
4) once parsing of the 100mb - CRLF last position is done, repeat the process from that CRLF position you found in step 2, until the whole multi-GB file is handled.
INPUT # reads sequentially from the disk a single byte at a time. It’s SLOOOOOOOOOW. Binary files read at the size of your disk clusters/sectors — usually 4096+ bytes per pass nowadays, so read times are a FRACTION of using INPUT #. Parsing from memory is much faster than parsing from disk, and will do the job in a fraction of the time.
Personally, I don’t see why you can’t read the whole file in one go and then process it. 3,000,000 records of 600 characters is 1.8 GB, and most machines can handle that readily. I got a few applications which use 22GB of ram to load extensive datasets all at once into memory, and have never had an issue with them with 32GB total system ram... As long as your PC has enough memory, I don’t see why you’d have any problem just loading in one go and then parsing.