BASH, cat Buffer - bash

cat report.txt | sed 's/<\/li>/<\/li> \n/g' > report.txt
This obviously results in an empty file.
Is there a mechanism that allows you to store the data before processing it or store the output until the command has finished executing and then write the file?
http://en.wikipedia.org/wiki/Pipeline_(Unix)#Implementation:
"...a receiving program may only be able to accept 100 bytes per second, but no data is lost. Instead, the output of the sending program is held in a queue. When the receiving program is ready to read data, the operating system sends its data from the queue, then removes that data from the queue."
Sounds like there should be a simple trick to load this into a queue instead of writing it immediately to file, then unload it after the command has finished?
Thanks much!

What you need is editing in place:
sed -i 's/<\/li>/<\/li> \n/g' report.txt

Related

made fifo file without using mkfifo or mknod

I was trying to stream mp3 music through gnuradio using vlc and mpg123 player. Following this site's example
http://www.opendigitalradio.org/Simple_FM_transmitter_using_gnuradio
The commands are:
$ mkfifo stream_32k.fifo
$ mpg123 -r32000 -m -s http://maxxima.mine.nu:8000 >stream_32k.fifo
Using my own mp3 stream, I followed the example, however there was one time I FORGOT to put
$ mkfifo stream_32k.fifo
to the terminal and instead only typed
$ mpg123 -r32000 -m -s http://localhost:8080/mp3 >stream_32k.fifo
directly to the terminal. The result was a .fifo file that is not highlighted (like the one created with mkfifo)
When using it with gnuradio, the fifo file made with mkfifo could only be played once and its size would always return back to 0 bytes.
While the one I accidentally created without using mkfifo kept the bytes for a long time and i could access it anytime i wanted which proved more beneficial to me.
Is there a disadvantage in making fifos this way? Also can somebody please tell me what I actually did?
Thank you so much!
You just created a regular file. As such it kept the bytes on the disk, where the real FIFO has nothing to do with permanent disk storage, it's essentially a buffer in memory which you give a "disk name" so that file oriented commands can work with it. The disadvantage is that when you're writing a permanent disk file you can not read from it at the same time (generally speaking, it depends how writing program actually writes, but you can not rely on it).
If you think that having .fifo in the file name makes it a FIFO then it's not right. mkfifo utility is what makes a filename attached to a FIFO.
If you want to keep the file and play the stream at the same time you can use an utility like tee:
mkfifo stream.fifo
mpg123 ...... | tee saved_stream.mp3 > stream.fifo
And then play from stream.fifo like you always do. Tee will 'capture' the bytes passing through it and save them to disk.

Ruby logging in realtime

I am trying to log some output to a file in realtime using Ruby. I would like to be able to do a tail -f on the log file and watch the output get written. At the moment the file only gets written to once I stop the ruby script. What I am trying to do seems straight forward.
I create the logfile
log = File.open(logFileName, "a")
I later write to it using:
log.puts "#{variable}"
Again, the log file gets created and the correct entries are in it but only once I have stopped the script from running. I need to tail the log file and see in realtime.
Thanks in advance!
Normally file input and output is buffered to a degree. You can disable this behaviour by flipping a flag:
log.sync = true
This disables buffering by forcing a flush operation after each write. With that enabled, programs like tail -f can read the data in real-time.

Displaying stdout on screen and a file simultaneously

I'd like to log standard output form a script of mine to a file, but also have it display to me on screen for realtime monitoring. The script outputs something about 10 times every second.
I tried to redirect stdout to a file and then tail -f that file from another terminal, but for some reason tail is updating the screen significantly slower than the script is writing to the file.
What's causing this lag? Is there an alternate method of getting one standard output stream both on my terminal and into a file for later examination?
I can't say why tail lags, but you can use tee:
Redirect output to multiple files, copies standard input to standard output and also to any files given as arguments. This is useful when you want not only to send some data down a pipe, but also to save a copy.
Example: <command> | tee <outputFile>
How much of a lag do you see? A few hundred characters? A few seconds? Minutes? Hours?
What you are seeing is buffering. Almost all file reads and writes are buffered. This includes input and output and there is also some buffering taking place within pipes. It's just more efficient to pass a packet of data around rather than a byte at a time. I believe data on HFS+ file systems are stored in UTF-16 while Mac OS X normally use UTF-8 as a default. (NTFS also stores data using UTF-16 while Windows uses code pages for character data by default).
So, if you run tail -f from another terminal, you may be seeing buffering from tail, but when you use a pipe and then tee, you may have a buffer in the pipe, and in the tee command which maybe why you see the lag.
By the way, how do you know there's a lag? How do you know how quickly your program is writing to the disk? Do you print out something in your program to help track the writes to the file?
In that case, you might not be lagging as much as you think. File writes are also buffered. So, it is very possible that the lag isn't from the tail -f, but from your script writing to the file.
Use tee command:
tail -f /path/logFile | tee outfile

Annoying cat/sed behavior when running through MATLAB on Mac

I am running a script to parse text email files that can be called by MATLAB or run from the command line. The script looks like this:
#!/bin/bash
MYSED=/opt/local/bin/gsed
"$MYSED" -n "/X-FileName/,/*/p" | "$MYSED" "/X-FileName/d" | "$MYSED" "/\-Original Message\-/q"
If I run cat message_file | ./parser.sh in my Terminal window, I get a parsed text file. If I do the same using the system command in MATLAB, I occasionally get the same parsed text followed by the error message
cat: stdout: Broken pipe
When I was using a sed command instead of a cat command, I was getting the same error message. This happens maybe on 1 percent of the files I am parsing, almost always large files where a lot gets deleted after the Original Message line. I do not get the error when I do not include the last pipe, the one deleting everything after 'Original Message'.
I would like to suppress the error message from cat if possible. Ideally, I would like to understand why running the script through MATLAB gives me an error while running it in Terminal does not? Since it tends to happen on larger files, I am guessing it has to do with a memory limitation, but 'broken pipe' is such a vague error message that I can't be sure. Any hints on either issue would be much appreciated.
I could probably run the script outside of MATLAB and save the processed files, but as some of the files are large I would much rather not duplicate them at this point.
The problem is occurring because of the final gsed command, "$MYSED" "/\-Original Message\-/q". This (obviously) quits as soon as it sees a match, and if the gsed feeding it tries to write anything after that it'll receive SIGPIPE and quit, and if there's enough data the same will happen to the first gsed, and if there's enough data after that SIGPIPE will be sent to the original cat command, which reports the error. Whether or not the error makes it back to cat or not will depend on timing, buffering, the amount of data, the phase of the moon, etc.
My first suggestion would be to put the "$MYSED" "/\-Original Message\-/q" command at the beginning of the pipeline, and have it do the reading from the file (rather than feeding it from cat). This'd mean changing the script to accept the file to read from as an argument:
#!/bin/bash
MYSED=/opt/local/bin/gsed
"$MYSED" "/\-Original Message\-/q" "$#" | "$MYSED" -n "/X-FileName/,/*/p" | "$MYSED" "/X-FileName/d"
...and then run it with ./parser.sh message_file. If my assumptions about the message file format are right, changing the order of the gsed commands this way shouldn't cause trouble. Is there any reason the message file needs to be piped to stdin rather than passed as an argument and read directly?

How to read data and read user response to each line of data both from stdin

Using bash I want to read over a list of lines and ask the user if the script should process each line as it is read. Since both the lines and the user's response come from stdin how does one coordinate the file handles? After much searching and trial & error I came up with the example
exec 4<&0
seq 1 10 | while read number
do
read -u 4 -p "$number?" confirmation
echo "$number $confirmation"
done
Here we are using exec to reopen stdin on file handle 4, reading the sequence of numbers from the piped stdin, and getting the user's response on file handle 4. This seems like too much work. Is this the correct way of solving this problem? If not, what is the better way? Thanks.
You could just force read to take its input from the terminal, instead of the more abstract standard input:
while read number
do
< /dev/tty read -p "$number?" confirmation
echo "$number $confirmation"
done
The drawback is that you can't automate acceptance (by reading from a pipe connected to yes, for example).
Yes, using an additional file descriptor is a right way to solve this problem. Pipes can only connect one command's standard output (file descriptor 1) to another command's standard input (file descriptor 1). So when you're parsing the output of a command, if you need to obtain input from some other source, that other source has to be given by a file name or a file descriptor.
I would write this a little differently, making the redirection local to the loop, but it isn't a big deal:
seq 1 10 | while read number
do
read -u 4 -p "$number?" confirmation
echo "$number $confirmation"
done 4<&0
With a shell other than bash, in the absence of a -u option to read, you can use a redirection:
printf "%s? " "$number"; read confirmation <&4
You may be interested in other examples of using file descriptor reassignment.
Another method, as pointed out by chepner, is to read from a named file, namely /dev/tty, which is the terminal that the program is running in. This makes for a simpler script but has the drawback that you can't easily feed confirmation data to the script manually.
For your application, killmatching, two passes is totally the right way to go.
In the first pass you can read all the matching processes into an array. The number will be small (dozens typically, tens of thousands at most) so there are no efficiency issues. The code will look something like
set -A candidates
ps | grep | while read thing do candidates+=("$thing"); done
(Syntactic details may be wrong; my bash is rusty.)
The second pass will loop through the candidates array and do the interaction.
Also, if it's available on your platform, you might want to look into pgrep. It's not ideal, but it may save you a few forks, which cost more than all the array lookups in the world.

Resources