create a rolling buffer in bash - bash

I want to use curl to get a stream from a remote server, and write it to a buffer. So far so good I just do curl http://the.stream>/path/to/thebuffer. Thing is I don't want this file to get too large, so I want to be able to delete the first bytes of the file as I simultaneously add to the last bytes. Is there a way of doing this?
Alternatively if I could write n bytes to buffer1, then switch to buffer2, buffer3.. and when buffer x was reached delete buffer1 and start again - without losing the data coming in from curl (it's a live stream, so I can't stop curl). I've been reading up the man pages for curl and cat and read, but can't see anything promising.

There isn't any particularly easy way to do what you are seeking to do.
Probably the nearest approach creates a FIFO, and redirects the output of curl to the FIFO. You then have a program such as split or csplit reading the FIFO and writing to different files. If you decide that the split programs are not the tool, you may need to write your own variation on them. You can then decide how to process the files that are created, and when to remove them.
Note that curl will hang until there is a process reading from the FIFO. When the process reading the FIFO exits, curl will get either a SIGPIPE signal or a write error, either of which should stop it.

Related

Protobuffers and Golang --- writing out marshal'ed structs and reading back in

Is there a generally accepted "correct" way for writing out and reading back in marshaled protocol buffer messages from a file?
I've been working on a smaller project that simulates a full network locally with gRPC and am trying to add writing to/ reading from files s.t. I can save state and start from there when its launched again. It seems I was naive in assuming these would remain on a single line:
Sees chain of length 3
from debugging messages I've written; but,
$ wc test.dat
7 8 2483 test.dat
So, I suppose there are an extra 4 newline's... Is there a method of delimiting these that I can use? or do I need to come up with one on my own? I realize this is straightforward, but in my mind, I can only probabilistically guarantee that <<<<DELIMIT>>>> or whatever will never show up and put me back at square 1.
Use proto.Marshal/Unmarshal:
That way you simulate (closest) to receiving the message while avoiding side effects from other Marshal methods.
Alternative: Dump it as []byte and reread it.

bash: wait for specific command output before continuing

I know there are several posts asking similar things, but none address the problem I'm having.
I'm working on a script that handles connections to different Bluetooth low energy devices, reads from some of their handles using gatttool and dynamically creates a .json file with those values.
The problem I'm having is that gatttool commands take a while to execute (and are not always successful in connecting to the devices due to device is busy or similar messages). These "errors" translate not only in wrong data to fill the .json file but they also allow lines of the script to continue writing to the file (e.g. adding extra } or similar). An example of the commands I'm using would be the following:
sudo gatttool -l high -b <MAC_ADDRESS> --char-read -a <#handle>
How can I approach this in a way that I can wait for a certain output? In this case, the ideal output when you --char-read using gatttool would be:
Characteristic value/description: some_hexadecimal_data`
This way I can make sure I am following the script line by line instead of having these "jumps".
grep allows you to filter the output of gatttool for the data you are looking for.
If you are actually looking for a way to wait until a specific output is encountered before continuing, expect might be what you are looking for.
From the manpage:
expect [[-opts] pat1 body1] ... [-opts] patn [bodyn]
waits until one of the patterns matches the output of a spawned
process, a specified time period has passed, or an end-of-file is
seen. If the final body is empty, it may be omitted.

Emulating 'named' process substitutions

Let's say I have a big gzipped file data.txt.gz, but often the ungzipped version needs to be given to a program. Of course, instead of creating a standalone unpacked data.txt, one could use the process substitution syntax:
./program <(zcat data.txt.gz)
However, depending on the situation, this can be tiresome and error-prone.
Is there a way to emulate a named process substitution? That is, to create a pseudo-file data.txt that would 'unfold' into a process substitution zcat data.txt.gz whenever it is accessed. Not unlike a symbolic link forwards a read operation to another file, but, in this case, it needs to be a temporary named pipe.
Thanks.
PS. Somewhat similar question
Edit (from comments) The actual use-case is having a large gzipped corpus that, besides its usage in its raw form, also sometimes needs to be processed with a series of lightweight operations (tokenized, lowercased, etc.) and then fed to some "heavier" code. Storing a preprocessed copy wastes disk space and repeated retyping the full preprocessing pipeline can introduce errors. In the same time, running the pipeline on-the-fly incurs a tiny computational overhead, hence the idea of a long-lived pseudo-file that hides the details under the hood.
As far as I know, what you are describing does not exist, although it's an intriguing idea. It would require kernel support so that opening the file would actually run an arbitrary command or script instead.
Your best bet is to just save the long command to a shell function or script to reduce the difficulty of invoking the process substitution.
There's a spectrum of options, depending on what you need and how much effort you're willing to put in.
If you need a single-use file, you can just use mkfifo to create the file, start up a redirection of your archive into the fifo, and and pass the fifo's filename to whoever needs to read from it.
If you need to repeatedly access the file (perhaps simultaneously), you can set up a socket using netcat that serves the decompressed file over and over.
With "traditional netcat" this is as simple as while true; do nc -l -p 1234 -c "zcat myfile.tar.gz"; done. With BSD netcat it's a little more annoying:
# Make a dummy FIFO
mkfifo foo
# Use the FIFO to track new connections
while true; do cat foo | zcat myfile.tar.gz | nc -l 127.0.0.1 1234 > foo; done
Anyway once the server (or file based domain socket) is up, you just do nc localhost 1234 to read the decompressed file. You can of course use nc localhost 1234 as part of a process substitution somewhere else.
It looks like this in action (image probably best viewed in separate tab):
Depending on your needs, you may want to make the bash script more sophisticated for caching etc, or just dump this thing and go for a regular web server in some scripting language you're comfortable with.
Finally, and this is probably the most "exotic" solution, you can write a FUSE filesystem that presents virtual files backed by whatever logic your heart desires. At this point you should probably have a good hard think about whether the maintainability and complexity costs of where you're going really offset someone having to call zcat a few extra times.

Controlling an interactive command-line utility from a Cocoa app - trouble with ptys

What I'm trying to do
My Cocoa app needs to run a bunch of command-line programs. Most of these are non-interactive, so I launch them with some command-line arguments, they do their thing, output something and quit. One of the programs is interactive, so it outputs some text and a prompt to stdout and then expects input on stdin and this keeps going until you send it a quit command.
What works
The non-interactive programs, which just dump a load of data to stdout and then terminate, are comparatively trivial:
Create NSPipes for stdout/stdin/stderr
Launch NSTask with those pipes
Then, either
get the NSFileHandle for the other end of the pipe to read all data until the end of the stream and process it in one go when the task ends
or
Get the -fileDescriptors from the NSFileHandle of the other end of the output pipes.
Set the file descriptor to use non-blocking mode
Create a GCD dispatch source with each of those file descriptors using dispatch_source_create(DISPATCH_SOURCE_TYPE_READ, ...
Resume the dispatch source and handle the data it throws at you using read()
Keep going until the task ends and the pipe file descriptor reports EOF (read() reports 0 bytes read)
What doesn't work
Either approach completely breaks down for interactive tools. Obviously I can't wait until the program exits because it's sitting at a command prompt and never will exit unless I tell it to. On the other hand, NSPipe buffers the data, so you receive it in buffer-sized chunks, unless the CLI program happens to flush the pipe explicitly, which the one in my case does not. The initial command prompt is much smaller than the buffer size, so I don't receive anything, and it just sits there. So NSPipe is also a no-go.
After some research, I determined that I needed to use a pseudo-terminal (pty) in place of the NSPipe. Unfortunately, I've had nothing but trouble getting it working.
What I've tried
Instead of the stdout pipe, I create a pty like so:
struct termios termp;
bzero(&termp, sizeof(termp));
int res = openpty(&masterFD, &slaveFD, NULL, &termp, NULL);
This gives me two file descriptors; I hand the slaveFD over to an NSFileHandle, which gets passed to the NSTask for either just stdout or both stdout and stdin. Then I try to do the usual asynchronous reading from the master side.
If I run the program I'm controlling in a Terminal window, it starts off by outputting 2 lines of text, one 18 bytes long including the newline, one 22 bytes and with no newline for the command prompt. After those 40 bytes it waits for input.
If I just use the pty for stdout, I receive 18 bytes of output (exactly one line, ending in newline) from the controlled program, and no more. Everything just sits there after the initial 18 bytes, no more events - the GCD event source's handler doesn't get called.
If I also use the pty for stdin, I usually receive 19 bytes of output (the aforementioned line plus one character from the next line) and then the controlled program dies immediately. If I wait a little before attempting to read the data (or scheduling noise causes a small pause), I actually get the whole 40 bytes before the program again dies instantly.
An additional dead end
At one point I was wondering if my async reading code was flawed, so I re-did everything using NSFileHandles and its -readInBackgroundAndNotify method. This behaved the same as when using GCD. (I originally picked GCD over the NSFileHandle API as there doesn't appear to be any async writing support in NSFileHandle)
Questions
Having arrived at this point after well over a day of futile attempts, I could do with some kind of help. Is there some fundamental problem with what I'm trying to do? Why does hooking up stdin to the pty terminate the program? I'm not closing the master end of the pty, so it shouldn't be receiving EOF. Leaving aside stdin, why am I only getting one line's worth of output? Is there a problem with the way I'm performing I/O on the pty's file descriptor? Am I using the master and slave ends correctly - master in the controlling process, slave in the NSTask?
What I haven't tried
I so far have only performed non-blocking (asynchronous) I/O on pipes and ptys. The only thing I can think of is that the pty simply doesn't support that. (if so, why does fcntl(fd, F_SETFL, O_NONBLOCK); succeed though?) I can try doing blocking I/O on background threads instead and send messages to the main thread. I was hoping to avoid having to deal with multithreading, but considering how broken all these APIs seem to be, it can't be any more time consuming than trying yet another permutation of async I/O. Still, I'd love to know what exactly I'm doing wrong.
The problem is likely that the stdio library inside is buffering output. The output will only appear in the read pipe when the command-line program flushes it, either because it writes a "\n" via the stdio library, or fflush()s, or the buffer gets full, or exits (which causes the stdio library to automatically flush any output still buffered), or possibly some other conditions. If those printf strings were "\n"-terminated, then you MIGHT the output quicker. That's because there are three output buffering styles -- unbuffered, line-buffered (\n causes a flush), and block buffered (when the output buffer gets full, it's auto-flushed).
Buffering of stdout is line-buffered by default if the output file descriptor is a tty (or pty); otherwise, block buffered. stderr is by default unbuffered. The setvbuf() function is used to change the buffering mode. These are all standard BSD UNIX (and maybe general UNIX) things I've described here.
NSTask does not do any setting up of ttys/ptys for you. It wouldn't help in this case anyway since the printfs aren't printing out \n.
Now, the problem is that the setvbuf() needs to be executed inside the command-line program. Unless (1) you have the source to the command-line program and can modify it and use that modified program, or (2) the command-line program has a feature that allows you to tell it to not buffer its output [ie, call setvbuf() itself], there's no way to change this, that I know of. The parent simply cannot affect the subprocess in this way, either to force flushing at certain points or change the stdio buffering behavior, unless the command-line utility has those features built into it (which would be rare).
Source: Re: NSTask, NSPipe's and interactive UNIX command

2-way communication with background process (I/O)

I have a program that runs in the command line (i.e. $ run program starts up a prompt) that runs mathematical calculations. It has it's own prompt that takes in text input and responds back through standard-out/error (or creates a separate x-window if needed, but this can be disabled). Sometimes I would like to send it small input, and other times I send in a large text file filled with a series of input on each line. This program takes a lot of resources and also has a large startup time, so it would be best to only have one instance of it running at a time. I could keep open the program-prompt and supply the input this way, or I can send the process with an exit command (to leave prompt) which just prints the output. The problem with sending the request with an exit command is that the program must startup each time (slow ...). Furthermore, the output of this program is sometimes cryptic and it would be helpful to filter the output in some way (eg. simplify output, apply ANSI colors, etc).
This all makes me want to put some 2-way IO filter (or is that "pipe"? or "wrapper"?) around the program so that the program can run in the background as single process. I would then communicate with it without having to restart. I would also like to have this all while filtering the output to be more user friendly. I have been looking all over for ideas and I am stumped at how to accomplish this in some simple shell accessible manor.
Some things I have tried were redirecting stdin and stdout to files, but the program hangs (doesn't quit) and only reads the file once making me unable to continue communication. I think this was because the prompt is waiting for some user input after the EOF. I thought that this could be setup as a local server, but I am uncertain how to begin accomplishing that.
I would love to find some simple way to accomplish this. Additionally, if you can think of a way to perform this, do you think there is a way to also allow for attaching or detaching to the prompt by request? Any help and ideas would be greatly appreciated.
You could create two named pipes (man mkfifo) and redirect input and output:
myprog < fifoin > fifoout
Then you could open new terminal windows and do this in one:
cat > fifoin
And this in the other:
cat < fifoout
(Or use tee to save the input/output as well.)
To dump a large input file into the program, use:
cat myfile > fifoin

Resources