Lazarus: Reading small text files into memory to process line by line - text-files

I have 400 small text files (less than 30 kb) that I wish to parse. The number of lines per file varies from 100 to about 250. Line length varies from 8 characters to about 1200 characters.
My present program reads through the directory, opens each file in turn and then uses readln to parse each file line by line.
What I would like to do is read each file * once into memory and then have some way to access and parse each line * while the whole file is in memory.
Can someone suggest which Lazarus functions would be best to use to accomplish this?
Thank you.

As #500 - Internal Server Error mentioned, loading each text file into a Stringlist is the easiest way to do this.
MyList := TStringList.Create;
MyList.LoadFromFile('file.txt');

#nepb just don't forget to free MyList after you have finished with it:
MyList := TStringList.Create;
MyList.LoadFromFile('file.txt');
//Do stuff with MyList
MyList.free;

or loading them into a TmemoryStream if you are prepared to handle the line breaks yourself. In a memorystream you can access a byte at a time, ideal for parsing.

Related

FB_FileGets vs FB_FileRead in twincat

There are two similar function for reading file in twincat software for Beckhoff company. FB_FileGets and FB_FileRead. I will be appreciate if someone explain what are the differences of these function and clear when we use each of them. both of them have same ‌prerequisite or not, use in same way in programs? which one has better speed(fast reading in different file format) and any inform that make them clear for better programming.
vs
The FB_FileGets reads the file line by line. So when you call it, you always get a one line of the text file as string. The maximum length of a line is 255 characters. So by using this function block, it's very easy to read all lines of a file. No need for buffers and memory copying, if the 255 line length limit is ok.
THe FB_FileReadreads given number of bytes from the file. So you can read files with for example 65000 characters in a single line.
I would use the FB_FileGets in all cases where you know that the lines are less than 255 characters and the you handle the data as line-by-line. It's very simple to use. If you have no idea of the line sizes, you need all data at once or the file is very big, I would use the FB_FileRead.
I haven't tested but I think that the FB_FileReadis probably faster, as it just copies the bytes to buffer. And you can read the whole file at once, not line-by-line.

Missing characters in text file, output via VB6

When outputting a text file, it randomly (it seems) misses characters. I have this running on over 3000 sites daily, and roughly 20 sites have it occur on a semi-regular basis. The file will range from 300kb to 5 meg! And the position it occurs is not the same, and regenerating the file once the problem is noticed, it will not miss the characters the second time.
The file is created with some incredibly simple vb6 code:
intfile = FreeFile
Open wFileName For Output As #intfile
Print #intfile, wContents
Close #intfile
If anyone has any ideas as to what could be causing it, that would be awesome, I'm completely stumped.
Thank you in advance for any advice!

Less memory intensive way to count lines in text file in vb

I have files that may contain up to 40,000,000 lines of data (records) in text files.
I need a memory efficient way to count how many lines are in a file.
I have tried:
Dim this As Integer
this = File.ReadAllLines(txtInPath.Text).Length
MsgBox(this)
where txtInPath is a text box in my form.
This code causes out of memory exception.
Thanks

File-size and line-length limits for Windows batch files

I am generating a Windows batch file that might become quite large, say a few megabytes. I've searched for possible limits regarding the maximum file size and maximum length of a line in a batch file, but I couldn't find anything. Any practical experiences?
I think filesize can be anything up to 2 GB, perhaps even more. It's an interpreted language, so if this is done right, filesize limit should be the filesize limit of the file system. I never had any errors with batch files being too large, and some of those I created were several MBs in size.
There should be a line length limit, but it should be more than 256. This can easily be tested, just do some "set A=123456789012...endofline", after that "echo %A%", and you'll see how far you can go.
It works for me with very long lines (around 4K), but at 8K echo gives a message, "Line too long", so 8192 bytes should be some limit.
Now tested for filesize, too, tested with "echo off", thousands of set lines, after that "echo end", and it worked for a 11 MB file (although it took some seconds to finish :) - no limit in sight here.
110 MB worked, too. Is this enough? ;)
The maximum length of any command line (or variable) within CMD is 8191 characters.
There are many different limits.
There is no (known) limit for the file itself, also code blocks seems to be unlimited.
The maximal size of a variable is 8191 characters.
Obviously the limit for a line has to be larger to assign a maximal sized variable.
Even the variable name can be 8191 chars long.
It's possible to build a line with 16387 chars
set <varname_8191_chars_long>=<content_8191_chars_long>
But the batch parser is able to read much longer lines (tested with 100k in one line).
But it's required, that the effective length is < ~8191 chars after the percent expansion.
Like in
echo %======%%=====%%======%%=====% ..... %=====%END
This works because after the expansion the line contains only echo END.
( One bug of the parser: The parser drops at every multiple of 8192 one character)
Some ideas, not necessarily mutually exclusive:
Switch to PowerShell.
Switch to a data-driven application, so that all the variable stuff is kept in a data file (CSV, text, whatever), and as a result you can have a smaller, boilerplate script that opens the data file and operates.
It should work at least up to 2 GB. The lines are read
directly from the BAT file on the disk (there is no caching
involved). I make this statement because of the following:
In fact you can edit a BAT file while it is running! And it
will work even though a text editor may rename the original
version and save the new version in a new location on the
disk. As long as you are careful not to insert text above
the currently executing command. Lines can be
changed/inserted/deleted below the currently executing
command and the new lines will be the ones executed. I have
often done this with BAT files containing a long list of
wget commands, each taking tens of minutes to execute.
According to Microsoft, there is no limit to a batch file size. However, a batch file line should not exceed 127 bytes or it will truncated at execution.
See Maximum Line Length and Count for Batch Files & CONFIG.SYS

Modifying an IO stream in-place? Ruby

I've been writing a ruby programme that merges the content of two files.
For example if a torrent have been downloaded two times separately, it tries to merge their contents for the blocks which have been completed.
So, I've been looking for a method which modifies a stream only at the place required and saves only that block instead of saving the whole stream again.
I'm reading the file in blocks of 16 KiBs, and how do I "replace" (not append) the content of that 16 KiBs so that only those bytes are written to disk and not the whole file is re-written each time!
Kind of,
#Doesn't exist unfortunately.
#By default it appends instead of replacing, so file size grows.
IO.write(file_name, content, offset, :replace => true)
Is there exists a method which achieves kind of that functionality?
Open the file in "r+b" mode, seek to the location and just write to it:
f=File.new("some.existing.file", "r+b");
f.seek(1024);
f.write("test\n");
f.close()
This will overwrite 5 characters of the file, following offset 1024.
If the file is shorter than your seek offset, an appropriate number of null characters are inserted to the file.

Resources