Annoying cat/sed behavior when running through MATLAB on Mac - macos

I am running a script to parse text email files that can be called by MATLAB or run from the command line. The script looks like this:
#!/bin/bash
MYSED=/opt/local/bin/gsed
"$MYSED" -n "/X-FileName/,/*/p" | "$MYSED" "/X-FileName/d" | "$MYSED" "/\-Original Message\-/q"
If I run cat message_file | ./parser.sh in my Terminal window, I get a parsed text file. If I do the same using the system command in MATLAB, I occasionally get the same parsed text followed by the error message
cat: stdout: Broken pipe
When I was using a sed command instead of a cat command, I was getting the same error message. This happens maybe on 1 percent of the files I am parsing, almost always large files where a lot gets deleted after the Original Message line. I do not get the error when I do not include the last pipe, the one deleting everything after 'Original Message'.
I would like to suppress the error message from cat if possible. Ideally, I would like to understand why running the script through MATLAB gives me an error while running it in Terminal does not? Since it tends to happen on larger files, I am guessing it has to do with a memory limitation, but 'broken pipe' is such a vague error message that I can't be sure. Any hints on either issue would be much appreciated.
I could probably run the script outside of MATLAB and save the processed files, but as some of the files are large I would much rather not duplicate them at this point.

The problem is occurring because of the final gsed command, "$MYSED" "/\-Original Message\-/q". This (obviously) quits as soon as it sees a match, and if the gsed feeding it tries to write anything after that it'll receive SIGPIPE and quit, and if there's enough data the same will happen to the first gsed, and if there's enough data after that SIGPIPE will be sent to the original cat command, which reports the error. Whether or not the error makes it back to cat or not will depend on timing, buffering, the amount of data, the phase of the moon, etc.
My first suggestion would be to put the "$MYSED" "/\-Original Message\-/q" command at the beginning of the pipeline, and have it do the reading from the file (rather than feeding it from cat). This'd mean changing the script to accept the file to read from as an argument:
#!/bin/bash
MYSED=/opt/local/bin/gsed
"$MYSED" "/\-Original Message\-/q" "$#" | "$MYSED" -n "/X-FileName/,/*/p" | "$MYSED" "/X-FileName/d"
...and then run it with ./parser.sh message_file. If my assumptions about the message file format are right, changing the order of the gsed commands this way shouldn't cause trouble. Is there any reason the message file needs to be piped to stdin rather than passed as an argument and read directly?

Related

What is mean that `grep -m 1 ` command in UNIX

I googled this command but there was not.
grep -m 1 "\[{" xxx.txt > xxx.txt
However I typed this command, error didn't occured.
Actually, there was not also result of this command.
Please anyone explain me this command's working?
This command reads from and writes to the same file, but not in a left-to-right fashion. In fact > xxx.txt runs first, emptying the file before the grep command starts reading it. Therefore there is no output. You can fix this by storing the result in a temporary file and then renaming that file to the original name.
PS: Some commands, like sed, have an output file option which works around this issue by not relying on shell redirects.

When data is piped from one program via | is there a way to detect what that program was from the second program?

Say you have a shell command like
cat file1 | ./my_script
Is there any way from inside the 'my_script' command to detect the command run first as the pipe input (in the above example cat file1)?
I've been digging into it and so far I've not found any possibilities.
I've been unable to find any environment variables set in the process space of the second command recording the full command line, the command data the my_script commands sees (via /proc etc) is just _./my_script_ and doesn't include any information about it being run as part of a pipe. Checking the process list from inside the second command even doesn't seem to provide any data since the first process seems to exit before the second starts.
The best information I've been able to find suggests in bash in some cases you can get the exit codes of processes in the pipe via PIPESTATUS, unfortunately nothing similar seems to be present for the name of commands/files in the pipe. My research seems to be saying it's impossible to do in a generic manner (I can't control how people decide to run my_script so I can't force 3rd party pipe replacement tools to be used over build in shell pipes) but it just at the same time doesn't seem like it should be impossible since the shell has the full command line present as the command is run.
(update adding in later information following on from comments below)
I am on Linux.
I've investigated the /proc/$$/fd data and it almost does the job. If the first command doesn't exit for several seconds while piping data to the second command can you read /proc/$$/fd/0 to see the value pipe:[PIPEID] that it symlinks to. That can then be used to search through the rest of the /proc//fd/ data for other running processes to find another process with a pipe open using the same PIPEID which gives you the first process pid.
However in most real world tests I've done of piping you can't trust that the first command will stay running long enough for the second one to have time to locate it's pipe fd in /proc before it exits (which removes the proc data preventing it being read). So if this method will return any information is something I can't rely on.

Bash script - run process & send to background if good, or else

I need to start up a Golang web server and leave it running in the background from a bash script. If the script in question in syntactically correct (as it will be most of the time) this is simply a matter of issuing a
go run /path/to/index.go &
However, I have to allow for the possibility that index.go is somehow erroneous. I should explain that in Golang this for something as "trival" as importing a module that you then fail to use. In this case the go run /path/to/index.go bit will return an error message. In the terminal this would be something along the lines of
index.go:4:10: expected...
What I need to be able to do is to somehow change that command above so I can funnel any error messages into a file for examination at a later stage. I tried variants on go run /path/to/index.go >> errors.txt with the terminating & in different positions but to no avail.
I suspect that there is a bash way to do this by altering the priority of evaluation of the command via some judiciously used braces/brackets etc. However, that is way beyond my bash capabilities. I would be most obliged to anyone who might be able to help.
Update
A few minutes later... After a few more experiments I have found that this works
go run /path/to/index.go &> errors.txt &
Quite apart from the fact that I don't in fact understand why it works there remains the issue that it produces a 0 byte errors.txt file when the command goes to completion without Golang throwing up any error messages. Can someone shed light on what is going on and how it might be improved?
Taken from man bash.
Redirecting Standard Output and Standard Error
This construct allows both the standard output (file descriptor 1) and the standard error output (file descriptor 2) to be redirected to the file whose name is the expansion of word.
There are two formats for redirecting standard output and standard error:
&>word
and
>&word
Of the two forms, the first is preferred. This is semantically equivalent to
>word 2>&1
Appending Standard Output and Standard Error
This construct allows both the standard output (file descriptor 1) and the standard error output (file descriptor 2) to be appended to the file whose name is the expansion of word.
The format for appending standard output and standard error is:
&>>word
This is semantically equivalent to
>>word 2>&1
Narūnas K's answer covers why the &> redirection works.
The reason why the file is created anyway is because the shell creates the file before it even runs the command in question.
You can see this by trying no-such-command > file.out and seeing that even though the shell errors because no-such-command doesn't exist the file gets created (using &> on that test will get the shell's error in the file).
This is why you can't do things like sed 'pattern' file > file to edit a file in place.

Switch from file contents to STDIN in piped command? (Linux Shell)

I have a program (that I did not write) which is not designed to read in commands from a file. Entering commands on STDIN is pretty tedious, so I'd like to be able to automate it by writing the commands in a file for re-use. Trouble is, if the program hits EOF, it loops infinitely trying to read in the next command dropping an endless torrent of menu options on the screen.
What I'd like to be able to do is cat a file containing the commands into the program via a pipe, then use some sort of shell magic to have it switch from the file to STDIN when it hits the file's EOF.
Note: I've already considered using cat with the '-' for STDIN. Unfortunately (I didn't know this before), piped commands wait for the first program's output to terminate before starting the second program -- they do not run in parallel. If there's some way to get the programs to run in parallel with that kind of piping action, that would work!
Any thoughts? Thanks for any assistance!
EDIT:
I should note that my goal is not only to prevent the system from hitting the end of the commands file. I would like to be able to continue typing in commands from the keyboard when the file hits EOF.
I would do something like
(cat your_file_with_commands; cat) | sh your_script
That way, when the file with commands is done, the second cat will feed your script with whatever you type on stdin afterwards.
Same as Idelic answer with more simple syntax ;)
cat your_file_with_commands - | sh your_script
I would think expect would work for this.
Have you tried using something like tail -f commandfile | command I think that should pipe the lines of the file to command without closing the file descriptor afterwards. Use -n to specify the number of lines to be piped if tail -f doesn't catch all of them.

How do I jump to the first line of shell output? (shell equivalent of emacs comint-show-output)

I recently discovered 'comint-show-output' in emacs shell mode, which jumps to the first line of shell output, which I find incredibly handy when looking at shell output that exceeds a screen length. The advantages of this command over scrolling with 'page up' are A) you don't have to scan with your eyes for the first line of the output B) you only have to hit the key combo once (instead of 'page up' a number of times which probably is not known beforehand).
I thought about ending all my commands with '| more' but actually this is not what I want since most of the time, I want to retain all output in the terminal buffer, and I usually want to see the end of the shell output first.
I use OSX. Is there a terminal app (on os x) and shell (on remote linux) combination equivalent (so I can do something similar without using emacs all the time - I know, crazy talk)? I normally use bash, but would be fine with switching shells just for this feature.
The way I do this sort of thing is by sending my output to a file and then watching the file as it is written. You still get the results of the command dumped to terminal history in real time and can still inspect the output's actual contents further after the fact (or in another terminal, etc...)
command > output &
tail -f output
head output
You could always do something in bash like this:
alias foo='!! | more'
which would make foo run the previous command with more. I'm not sure of any way to do exactly what you are suggesting.
If you're expecting a lot of output and don't want to run your command twice, you can use tee(1) to fork the output:
my-command | tee /tmp/my-command.log | less
This will pipe the output to a paginator (less), while simultaneously logging the output to a file (in this case, a file named /tmp/my-command.log). If you need to review the output after you've quit from less, you can just cat the log file instead of re-running the command.

Resources