readString vs readLine - go

I am writing an application to read from a list of files, line by line and do some processing. I want to use as little RAM as I can.
I came across this question https://stackoverflow.com/a/41741702/3531263
Where the poster is saying readString uses more RAM than readLine and they have posted some code.
What I don't understand is how one uses more RAM? Because ultimately, the way their code is written, they are still writing an entire line to their buffer. So would that not mean if they had just used readString, it would have been the same thing?

the way their code is written, they are still writing an entire line to their buffer
Their code, yes. Your code might not need the whole line to be in memory at the same time. For example, your program is filtering a log file by request id, which is in the beginning of the line. It doesn't need to read the whole line which may be a few megabytes or more, only to reject it due to wrong request id. But with ReadString you don't have the luxury of choice.

I 'gree with Sergio. Also, have a look at the current implementation in the standard library. ReadLine calls ReadSlice('\n') once, then runs through a few branches to make sure the appropriate sentinel values or errors are returned with the converted data. On the other hand, ReadBytes and ReadString both loop over repeated calls to ReadSlice(delim), so it follows that they would necessarily be copying at least as much data into memory as ReadLine, and potentially much more if the delimiter wasn't found in the first call.

Related

Protobuffers and Golang --- writing out marshal'ed structs and reading back in

Is there a generally accepted "correct" way for writing out and reading back in marshaled protocol buffer messages from a file?
I've been working on a smaller project that simulates a full network locally with gRPC and am trying to add writing to/ reading from files s.t. I can save state and start from there when its launched again. It seems I was naive in assuming these would remain on a single line:
Sees chain of length 3
from debugging messages I've written; but,
$ wc test.dat
7 8 2483 test.dat
So, I suppose there are an extra 4 newline's... Is there a method of delimiting these that I can use? or do I need to come up with one on my own? I realize this is straightforward, but in my mind, I can only probabilistically guarantee that <<<<DELIMIT>>>> or whatever will never show up and put me back at square 1.
Use proto.Marshal/Unmarshal:
That way you simulate (closest) to receiving the message while avoiding side effects from other Marshal methods.
Alternative: Dump it as []byte and reread it.

Accessing memory location using pseudo "file handle" in MATLAB

There's lots of questions relating to dealing with large data sets by avoiding loading the whole thing into memory. My question is kind of the opposite: I've written code that reads files line by line to avoid memory overflow problems. However, I've just been given access to a powerful workstation with several hundred GB of memory, removing that problem, and making disk-access into the bottleneck.
Thing is, my code is written to access data files line by line using functions like fgetl. Is it possibly for me to somehow replace the file handle f = fopen('datafile.txt') with something else that acts in exactly the same way with respect to functions reading from a file, but instead of reading from the disk just returns values stored in memory?
I'm thinking, for example, having a large cell array with the contents of the file split by line and fgetl just returns the next. If I have to write my own wrapper for this, how can I go about doing this?

two programs accessing one file

New to this forum - looks great!
I have some Processing code that periodically reads data wirelessly from remote devices and writes that data as bytes to a file, e.g. data.dat. I want to write an Objective C program on my Mac Mini using Xcode to read this file, parse the data, and act on the data if data values indicate a problem. My question is: can my two different programs access the same file asynchronously without a problem? If this is a problem can you suggest a technique that will allow these operations?
Thanks,
Kevin H.
Multiple processes can read from the same file at a time without any problem. A process can also read from a file while another writes without problem, although you'll have to take care to ensure that you read in any new data that was written. Multiple processes should not write to the same file at at the same time, though. The OS will let you do it, but the ordering of data will be undefined, and you'll like overwrite data—in general, you're gonna have a bad time if you do that. So you should take care to ensure that only one process writes to a file at a time.
The simplest way to protect a file so that only one process can write to it at a time is with the C function flock(), although that function is admittedly a bit rudimentary and may or may not suit your use case.

Incrementally reading logs

Looked around with numerous search strings but can't find anything quite like this:
I'm writing a custom log parser (ala analog or webalizer except not for webserver) and I want to be able to skip the hard work for the lines that have already been parsed. I have thought about using a history file like webalizer but have no idea how it actually works internally and my C is pretty poor.
I've considered hashing each line and writing the hashes out, then parsing the history file for their presence but I think this will perform poorly.
The only other method I can think of is storing the line number of the last parse and skipping until that number is reached the next time round. What happens when the log is rotated I am not sure.
Any other ideas would be appreciated. I will be writing the parser in ruby but tips in a similar language will help as well.
The solutions I can think of right now are bound to be brittle.
Even if you store the line number and later realize it would be past the length of the current file, what happens if old lines have been trimmed? You would start reading (well) after the last position.
If, on the other hand, you are sure your log files won't be tampered with and they will only be rotated, I only see two ways of doing what you want, and I'm not sure the second is applicable to you.
Anyway, here goes.
First solution
You store the last line you parsed along with a timestamp. At the next run, you consider all the rotated log files sorting them by their last modified date, figure out which one you read last time, and start reading from there.
I didn't think this through, there might be funny corner cases you will need to handle.
Second solution
You create a background script that continuously watches the log file. A quick search on Google turned out this gem, but I'm not sure if that's even an option for you. Even then, you might want to integrate this solution with the previous one just in case your daemon will get interrupted (because that's clearly bound to happen at some point).
As you read the file and parse the lines keep track of the byte count. Save that. On next read, try to seek to that byte offset in the file. If the file is smaller than the byte count, it's a new file so start at the beginning.

Implementation of Fifo in GNU-GUILE

I would like to do the following :
I want to imple,ment the concept of FIFO in normal files using GUILE.
Two processes should communicate via a normal text file, that a third process , if needed, can access.
The subordinate of the original two processes should write in the file, line after line, that is append. So far so good. (implemented in c++)
The master proces however, should treat this file as a FIFO, it should read the first line, and do somethong corresponding to it, and delete the first line leaving the rest intact.
The problems are :
While the Master is accessing the file, the subordinate may come to a point where it must write there, leading to a conflict.
Popping the first line may need reading the whole ile out, in a string, poping the first thereof, and then saving it, which is memory intensive, and the second saving action may again conflict with the child trying to write there,
I wanted to implement this in GUILE, because since it is the official OS extension language, there might be better ways which addresses the above two issues.
But in the web I do not find much to orient myself. Please help, sorry for the lewss than concrete question, then I dont have a code snippet to show.

Resources