shell scripting: error handling in piped command - ksh

I have a set of commands chained with pipes like this:
cat very_big_data.csv | awk -f ../bin/repair_preproc.awk | tr -d "\n" | tr "\007" "\n" | sed 's/> *</></g' > output.csv
The CSV file to be processen is quite big (10-20GB), so I would like to keep it in one stream.
I yould need to be able to catch errors potentially raising in the chained commands, for example I would need to be sure if the awk, the tr and the sed commands finished ALL successfully.

Whoops, finally I've found some answer.
In bash, there is az array, called PIPESTATUS, so you can check the exit status of every pipe member.
But I need to do it in AIX, I only have ksh. In ksh there is a shell option: pipefail. You can use this to get the first nonzero exit status from a pipe stricture. Well, in default AIX ksh it isn't working either, so you have to use ksh93 shell.
There it is.

Related

How to add string to cat result in bash?

File info has some certain info on a line starting with myline. Im trying to pass it to a script like this:
bash myscript `cat info | grep myline`
This works well. Script gets "myline" as first argument. But now i want to add a "w" at the end of that. I tried
bash myscript `cat info | grep myline`w
This is already problematic, the script gets "wyline" as first argument.
And now the next step is that i actually want to have an if statement whether i want to add w or not. Tried this:
bash myscript `cat info | grep myline``[ "condition" == "condition"] && echo "w"`
This works the same way. Script gets "wyline" as first argument.
So I have two questions:
1) How to fix the "wyline" result to get desired "mylinew"
2) Is there a better way to write this if statement after cat?
Do not use backticks `, use $(...) instead. bash hackers obsolete deprecated syntax
cat file | grep is a useless use of cat useless use of cat award. Just grep file.
Just quote the result and add w:
myscript "$(grep myline info)w"
You can add a trailing w to the last line of input with sed:
myscript "$(grep myline info | sed '$s/$/w/')"
I would advise to always quote your variable expansions.
Script gets "wyline" as first argument.
Your input file has dos line endings. Inspect output with cut -v or hexdump -C or xxd. Use dos2unix and remove carriage return characters.

Cannot grep a file from one line number to another in shell script

I am unable to grep a file from a shell script written. Below is the code
#!/bin/bash
startline6=`cat /root/storelinenumber/linestitch6.txt`
endline6="$(wc -l < /mnt/logs/arcfilechunk-aim-stitch6.log.2017-11-08)"
awk 'NR>=$startline6 && NR<=$endline6' /mnt/logs/arcfilechunk-aim-stitch6.log.2017-11-08 | grep -C 100 'Error in downloading indivisual chunk' > /root/storelinenumber/error2.txt
The awk command is working on standalone basis though when the start and end line numbers are given manually.
There was an issue with the syntax. The last line was modified to
awk 'NR>='"$startline9"' && NR<='"$endline9"'' /mnt/logs/arcfilechunk-aim-stitch9.log | grep -C 100 'Error in downloading indivisual chunk' >> /root/storelinenumber/error.txt
It solved the issue.
You have your attempted variable expansions within single quotes, meaning that they won't actually be expanded.
When passing shell variables into awk, I prefer them to be actual first-class awk variables so I don't have to worry about that sort of stuff:
awk -vstl=$startline6 -vendl=$endline6 'NR>=stl && NR<=endl ...

pipe tail output into another script

I am trying to pipe the output of a tail command into another bash script to process:
tail -n +1 -f your_log_file | myscript.sh
However, when I run it, the $1 parameter (inside the myscript.sh) never gets reached. What am I missing? How do I pipe the output to be the input parameter of the script?
PS - I want tail to run forever and continue piping each individual line into the script.
Edit
For now the entire contents of myscripts.sh are:
echo $1;
Generally, here is one way to handle standard input to a script:
#!/bin/bash
while read line; do
echo $line
done
That is a very rough bash equivalent to cat. It does demonstrate a key fact: each command inside the script inherits its standard input from the shell, so you don't really need to do anything special to get access to the data coming in. read takes its input from the shell, which (in your case) is getting its input from the tail process connected to it via the pipe.
As another example, consider this script; we'll call it 'mygrep.sh'.
#!/bin/bash
grep "$1"
Now the pipeline
some-text-producing-command | ./mygrep.sh bob
behaves identically to
some-text-producing-command | grep bob
$1 is set if you call your script like this:
./myscript.sh foo
Then $1 has the value "foo".
The positional parameters and standard input are separate; you could do this
tail -n +1 -f your_log_file | myscript.sh foo
Now standard input is still coming from the tail process, and $1 is still set to 'foo'.
Perhaps your were confused with awk?
tail -n +1 -f your_log_file | awk '{
print $1
}'
would print the first column from the output of the tail command.
In the shell, a similar effect can be achieved with:
tail -n +1 -f your_log_file | while read first junk; do
echo "$first"
done
Alternatively, you could put the whole while ... done loop inside myscript.sh
Piping connects the output (stdout) of one process to the input (stdin) of another process. stdin is not the same thing as the arguments sent to a process when it starts.
What you want to do is convert the lines in the output of your first process into arguments for the the second process. This is exactly what the xargs command is for.
All you need to do is pipe an xargs in between the initial command and it will work:
tail -n +1 -f your_log_file | xargs | myscript.sh

Use output of bash command (with pipe) as a parameter for another command

I'm looking for a way to use the ouput of a command (say command1) as an argument for another command (say command2).
I encountered this problem when trying to grep the output of who command but using a pattern given by another set of command (actually tty piped to sed).
Context:
If tty displays:
/dev/pts/5
And who displays:
root pts/4 2012-01-15 16:01 (xxxx)
root pts/5 2012-02-25 10:02 (yyyy)
root pts/2 2012-03-09 12:03 (zzzz)
Goal:
I want only the line(s) regarding "pts/5"
So I piped tty to sed as follows:
$ tty | sed 's/\/dev\///'
pts/5
Test:
The attempted following command doesn't work:
$ who | grep $(echo $(tty) | sed 's/\/dev\///')"
Possible solution:
I've found out that the following works just fine:
$ eval "who | grep $(echo $(tty) | sed 's/\/dev\///')"
But I'm sure the use of eval could be avoided.
As a final side node: I've noticed that the "-m" argument to who gives me exactly what I want (get only the line of who that is linked to current user). But I'm still curious on how I could make this combination of pipes and command nesting to work...
One usually uses xargs to make the output of one command an option to another command. For example:
$ cat command1
#!/bin/sh
echo "one"
echo "two"
echo "three"
$ cat command2
#!/bin/sh
printf '1 = %s\n' "$1"
$ ./command1 | xargs -n 1 ./command2
1 = one
1 = two
1 = three
$
But ... while that was your question, it's not what you really want to know.
If you don't mind storing your tty in a variable, you can use bash variable mangling to do your substitution:
$ tty=`tty`; who | grep -w "${tty#/dev/}"
ghoti pts/198 Mar 8 17:01 (:0.0)
(You want the -w because if you're on pts/6 you shouldn't see pts/60's logins.)
You're limited to doing this in a variable, because if you try to put the tty command into a pipe, it thinks that it's not running associated with a terminal anymore.
$ true | echo `tty | sed 's:/dev/::'`
not a tty
$
Note that nothing in this answer so far is specific to bash. Since you're using bash, another way around this problem is to use process substitution. For example, while this does not work:
$ who | grep "$(tty | sed 's:/dev/::')"
This does:
$ grep $(tty | sed 's:/dev/::') < <(who)
You can do this without resorting to sed with the help of Bash variable mangling, although as #ruakh points out this won't work in the single line version (without the semicolon separating the commands). I'm leaving this first approach up because I think it's interesting that it doesn't work in a single line:
TTY=$(tty); who | grep "${TTY#/dev/}"
This first puts the output of tty into a variable, then erases the leading /dev/ on grep's use of it. But without the semicolon TTY is not in the environment by the moment bash does the variable expansion/mangling for grep.
Here's a version that does work because it spawns a subshell with the already modified environment (that has TTY):
TTY=$(tty) WHOLINE=$(who | grep "${TTY#/dev/}")
The result is left in $WHOLINE.
#Eduardo's answer is correct (and as I was writing this, a couple of other good answers have appeared), but I'd like to explain why the original command is failing. As usual, set -x is very useful to see what's actually happening:
$ set -x
$ who | grep $(echo $(tty) | sed 's/\/dev\///')
+ who
++ sed 's/\/dev\///'
+++ tty
++ echo not a tty
+ grep not a tty
grep: a: No such file or directory
grep: tty: No such file or directory
It's not completely explicit in the above, but what's happening is that tty is outputting "not a tty". This is because it's part of the pipeline being fed the output of who, so its stdin is indeed not a tty. This is the real reason everyone else's answers work: they get tty out of the pipeline, so it can see your actual terminal.
BTW, your proposed command is basically correct (except for the pipeline issue), but unnecessarily complex. Don't use echo $(tty), it's essentially the same as just tty.
You can do it like this:
tid=$(tty | sed 's#/dev/##') && who | grep "$tid"

How to execute the output of a command within the current shell?

I'm well aware of the source (aka .) utility, which will take the contents from a file and execute them within the current shell.
Now, I'm transforming some text into shell commands, and then running them, as follows:
$ ls | sed ... | sh
ls is just a random example, the original text can be anything. sed too, just an example for transforming text. The interesting bit is sh. I pipe whatever I got to sh and it runs it.
My problem is, that means starting a new sub shell. I'd rather have the commands run within my current shell. Like I would be able to do with source some-file, if I had the commands in a text file.
I don't want to create a temp file because feels dirty.
Alternatively, I'd like to start my sub shell with the exact same characteristics as my current shell.
update
Ok, the solutions using backtick certainly work, but I often need to do this while I'm checking and changing the output, so I'd much prefer if there was a way to pipe the result into something in the end.
sad update
Ah, the /dev/stdin thing looked so pretty, but, in a more complex case, it didn't work.
So, I have this:
find . -type f -iname '*.doc' | ack -v '\.doc$' | perl -pe 's/^((.*)\.doc)$/git mv -f $1 $2.doc/i' | source /dev/stdin
Which ensures all .doc files have their extension lowercased.
And which incidentally, can be handled with xargs, but that's besides the point.
find . -type f -iname '*.doc' | ack -v '\.doc$' | perl -pe 's/^((.*)\.doc)$/$1 $2.doc/i' | xargs -L1 git mv
So, when I run the former, it'll exit right away, nothing happens.
The eval command exists for this very purpose.
eval "$( ls | sed... )"
More from the bash manual:
eval
eval [arguments]
The arguments are concatenated together
into a single command, which
is then read and executed, and its
exit status returned as the exit
status of eval. If there are no
arguments or only empty arguments, the
return status is zero.
$ ls | sed ... | source /dev/stdin
UPDATE: This works in bash 4.0, as well as tcsh, and dash (if you change source to .). Apparently this was buggy in bash 3.2. From the bash 4.0 release notes:
Fixed a bug that caused `.' to fail to read and execute commands from non-regular files such as devices or named pipes.
Try using process substitution, which replaces output of a command with a temporary file which can then be sourced:
source <(echo id)
Wow, I know this is an old question, but I've found myself with the same exact problem recently (that's how I got here).
Anyway - I don't like the source /dev/stdin answer, but I think I found a better one. It's deceptively simple actually:
echo ls -la | xargs xargs
Nice, right? Actually, this still doesn't do what you want, because if you have multiple lines it will concat them into a single command instead of running each command separately. So the solution I found is:
ls | ... | xargs -L 1 xargs
the -L 1 option means you use (at most) 1 line per command execution. Note: if your line ends with a trailing space, it will be concatenated with the next line! So make sure each line ends with a non-space.
Finally, you can do
ls | ... | xargs -L 1 xargs -t
to see what commands are executed (-t is verbose).
Hope someone reads this!
`ls | sed ...`
I sort of feel like ls | sed ... | source - would be prettier, but unfortunately source doesn't understand - to mean stdin.
I believe this is "the right answer" to the question:
ls | sed ... | while read line; do $line; done
That is, one can pipe into a while loop; the read command command takes one line from its stdin and assigns it to the variable $line. $line then becomes the command executed within the loop; and it continues until there are no further lines in its input.
This still won't work with some control structures (like another loop), but it fits the bill in this case.
To use the mark4o's solution on bash 3.2 (macos) a here string can be used instead of pipelines like in this example:
. /dev/stdin <<< "$(grep '^alias' ~/.profile)"
I think your solution is command substitution with backticks: http://tldp.org/LDP/Bash-Beginners-Guide/html/sect_03_04.html
See section 3.4.5
Why not use source then?
$ ls | sed ... > out.sh ; source out.sh

Resources