I'm relatively new to shell scripting and I'm in the process of writing my own health checking scripts using bash.
Is the following script to test cpu load the best I can have in terms of performance, readability and maintainability?
#!/bin/sh
getloadavg5 () {
echo $(cat /proc/loadavg | cut -f2 -d' ')
}
getnumcpus () {
echo $(cat /proc/cpuinfo | grep '^processor' | wc -l)
}
awk \
-v failthold=0.8 \
-v warnthold=0.7 \
-v loadavg=$(getloadavg5) \
-v numcpus=$(getnumcpus) \
'BEGIN {
ratio=loadavg/numcpus
if (ratio >= failthold) exit 2
if (ratio >= warnthold) exit 1
exit 0
}'
This might be more suitable for the code review stackexchange, but without condoning the use of load averages in this way, here are some ideas:
#!/bin/sh
read -r one five fifteen rest < /proc/loadavg
cpus=$(grep -c '^processor' /proc/cpuinfo)
awk \
-v failthold=0.8 \
-v warnthold=0.7 \
-v loadavg="$five" \
-v numcpus="$cpus" \
'BEGIN {
ratio=loadavg/numcpus
if (ratio >= failthold) exit 2
if (ratio >= warnthold) exit 1
exit 0
}'
It doesn't have any of the unnecessary cats/echos.
It also happens to run faster thanks to forking 1 or 2 times (depending on shell) instead of ~10, but if performance is an issue then shell scripts should be avoided in general.
Related
Let me show you a snippet of my Bash script and how I try to run parallel:
parallel -a "$file" \
-k \
-j8 \
--block 100M \
--pipepart \
--bar \
--will-cite \
_fix_col_number {} | _unify_null_value {} >> "$OUTPUT_DIR/$new_filename"
So, I am basically trying to process each line in a file in parallel using Bash functions defined inside my script. However, I am not sure how to pass each line to my defined functions "_fix_col_number" and "_unify_null_value". Whatever I do, nothing gets passed to the functions.
I am exporting the functions like this in my script:
declare -x NUM_OF_COLUMNS
export -f _fix_col_number
export -f _add_tabs
export -f _unify_null_value
The mentioned functions are:
_unify_null_value()
{
_string=$(echo "$1" | perl -0777 -pe "s/(?<=\t)\.(?=\s)//g" | \
perl -0777 -pe "s/(?<=\t)NA(?=\s)//g" | \
perl -0777 -pe "s/(?<=\t)No Info(?=\s)//g")
echo "$_string"
}
_add_tabs()
{
_tabs=""
for (( c=1; c<=$1; c++ ))
do
_tabs="$_tabs\t"
done
echo -e "$_tabs"
}
_fix_col_number()
{
line_cols=$(echo "$1" | awk -F"\t" '{ print NF }')
if [[ $line_cols -gt $NUM_OF_COLUMNS ]]; then
new_line=$(echo "$1" | cut -f1-"$NUM_OF_COLUMNS")
echo -e "$new_line\n"
elif [[ $line_cols -lt $NUM_OF_COLUMNS ]]; then
missing_columns=$(( NUM_OF_COLUMNS - line_cols ))
new_line="${1//$'\n'/}$(_add_tabs $missing_columns)"
echo -e "$new_line\n"
else
echo -e "$1"
fi
}
I tried removing {} from parallel. Not really sure what I am doing wrong.
I see two problems in the invocation plus additional problems with the functions:
With --pipepart there are no arguments. The blocks read from -a file are passed over stdin to your functions. Try the following commands to confirm this:
seq 9 > file
parallel -a file --pipepart echo
parallel -a file --pipepart cat
Theoretically, you could read stdin into a variable and pass that variable to your functions, ...
parallel -a file --pipepart 'b=$(cat); someFunction "$b"'
... but I wouldn't recommend it, especially since your blocks are 100MB each.
Bash interprets the pipe | in your command before parallel even sees it. To run a pipe, quote the entire command:
parallel ... 'b=$(cat); _fix_col_number "$b" | _unify_null_value "$b"' >> ...
_fix_col_number seems to assume its argument to be a single line, but receives 100MB blocks instead.
_unify_null_value does not read stdin, so _fix_col_number {} | _unify_null_value {} is equivalent to _unify_null_value {}.
That being said, your functions can be drastically improved. They start a lot of processes which becomes incredibly expensive for larger files. You can do some trivial improvements like combining perl ... | perl ... | perl ... into a single perl. Likewise, instead of storing everything in variables, you can process stdin directly: Just use f() { cmd1 | cmd2; } instead of f() { var=$(echo "$1" | cmd1); var=$(echo "$var" | cmd2); echo "$var"; }.
However, don't waste time on small things like these. A complete rewrite in sed, awk, or perl is easy and should outperfom every optimization on the existing functions.
Try
n="INSERT NUMBER OF COLUMNS HERE"
tabs=$(perl -e "print \"\t\" x $n")
perl -pe "s/\r?\$/$tabs/; s/\t\K(\.|NA|No Info)(?=\s)//g;" file |
cut -f "1-$n"
If you still find this too slow, leave out file; pack the command into a function, export that function and then call parallel -a file -k --pipepart nameOfTheFunction. The option --block is not necessary as pipepart will evenly split the input based on the number of jobs (can be specified with -j).
I have a log file. I’m doing tail -f and grep options whenever new logs are coming. I’m facing loop issue, It is executing multiple times. here is my script,
AuditTypeID=$""
QueryResult=$""
tail -n 0 -F hive-server2.log \
| while read LINE
do
if [ `echo $LINE | grep -c "select *" ` -gt 0 ]
then
AuditTypeID=15
QueryResult=$(
awk '
BEGIN{ print "" }
/Executing command\(queryId/{ sub(/.*queryId=[^[:space:]]+: /,""); q=$0 }
/s3:\/\//{ print "," q }
' OFS=',' hive-server2.log \
| sed -n \$p
)
elif [ `echo $LINE | grep -c 'select count' ` -gt 0 ]
then
AuditTypeID=22
QueryResult="$(
grep -oE 'select count\(.\) from [a-zA-Z][a-zA-Z0-9]*' hive-server2.log \
| sed -n \$p
)"
fi
user=$(
cat hive-server2.log \
| grep user \
| awk -F "[. ]" '{print "," $(NF-1)}' \
| tr -d ',' \
| tr -d 'UTC'
)
Additional_Info=$(
echo -e "{\"user\":\"""${user}""\", \"query\":\"""${QueryResult}""\",\"s3Path\":\"""${s3}""\"}"
)
echo -e "$Additional_Info" > op.json
for file in /var/log/hive/op.json
do
boto-rsync $file s3://hive-log/log/script/$file.$current_time
done
done
It will filter the operations based on the keyword. For some reason it is executing multiple times. I need to save the output for only one instance and any help to simplify the logic is appreciated.
First thing I see in your script is that in the first awk scriptlet inside the if statement you seem to be reparsing the whole of hive-server2.log (which is probably racy/bad because you are tailing to your script, and hive-server.log is growing?)... and this reparsing of the log seems to be a common theme in the script -- I think this is the root cause of the issue.
One simplification ;) readily apparent is removal of the elif code -- it will never run because /select count/ will be matched by the if statement's /select */.
To truly take a stab at simplifying this, my strategy would be to rewrite the whole of this in awk. There is nothing that you are doing here that is beyond awk's built-in capabilities -- and awk can fire off external shell commands as easily as sh. The awk implementation will also likely be much faster.
I started trying to do this translation, but with the way you are specifying multiple reparsing of hive-server2.log, I frankly got lost. Having a bit of input and intended output would help here... Please post hive-server2.log and your expected output.
In bash we can set an environment variable from a sequence of commands using read and a pipe to a subprocess. But I'm having trouble detecting errors in my processing in one edge case - a part of the subprocess pipeline producing some output before erroring.
A simplified example which takes an input file, looks for a line starting with "foo" and sets var to the first word on that line is:
set -e
set -o pipefail
set -o nounset
die() {
echo $1 > /dev/stderr
exit 1
}
read -r var rest < <( \
cat data.txt \
| grep foo \
|| die "PIPELINE" \
) || die "OUTER"
echo "var=$var"
Running this with data.txt like
blah
zap foo awesome
bang foo
will output
var=zap
Running this on a data.txt file that doesn't contain foo outputs (to stderr)
DEAD: PIPELINE
DEAD: OUTER
This is all as expected.
We can introduce another no-op stage like cat at the end of the process
...
read -r var rest < <( \
cat data.txt \
| grep foo \
| cat \
|| die "PIPELINE" \
) || die "OUTER"
...
and everything continues to work.
But if the additional stage is paste -s -d' ' and the input does not contain "foo" the output is
var=
DEAD: PIPELINE
Which seems to show that the pipeline errors, but read succeeds with an empty line. (It looks like paste -s -d' ' outputs a line of output even when its input is empty.)
Is there a simple way to detect this failure of the pipeline, and cause the main script to error out?
I guess I could check that the variable is not empty - but this is a simplified version - I'm actually using sed and paste to join multiple lines to set multiple variables, like
read -r v1 v2 v3 rest < <( \
cat data.txt \
| grep "^foo=" \
| sed -e 's/foo=//' \
| paste -s -d' ' \
|| die "PIPELINE"
) || die "OUTER"
You could use another grep to see if the output of paste contained something:
read -r var rest < <( \
cat data.txt \
| grep foo \
| paste -s -d' ' \
| grep . \
|| die "PIPELINE" \
) || die "OUTER"
In the end I went with two different solutions depending on the context.
The first was to pipe the results to a temporary file. This will process the entire file before performing the read, and thus any failures in the pipe will cause the script to fail.
cat data.txt \
| grep "^foo=" \
| sed -e 's/foo=//' \
| paste -s -d' ' \
> $TMP/result.txt
|| die "PIPELINE"
read -r var rest < $TMP/result.txt || die "OUTER"
The second was to just test that the variables were set. While this meant
there was a bunch of duplication that I wanted to avoid, it seemed the most bullet-proof solution.
read -r var rest < <( cat data.txt \
| grep "^foo=" \
| sed -e 's/foo=//' \
| paste -s -d' ' \
|| die "PIPELINE"
) || die "OUTER"
[ ! -z "$var" ] || die "VARIABLE NOT SET"
I have an alias in my bashrc file that outputs current folder contents and system available storage, updated continuously by the watch function.
alias wtch='watch -n 0 -t "du -sch * -B 1000000 2>/dev/null | sort -h && df -h -B 1000000| head -2 | awk '{print \$4}'"'
The string worked fine until I put in the awk part. I know I need to escape the single quotation marks, while still staying in the double quotation marks and the $4 but I haven't been able to get it to work. What am I doing wrong?
This is the error I get
-bash: alias: $4}": not found
Since the quoting for the alias is making it tough, you could just make it a function instead:
wtch() {
watch -n 0 -t "du -sch * -B 1000000 2>/dev/null | sort -h && df -h -B 1000000| head -2 | awk '{print $4}'"
}
This is a lot like issue 2 in the BashFAQ/050
Also, a minor thing but you can skip the head process at the end and just have awk do it, even exiting after the second row like
wtch() {
watch -n 0 -t "du -sch * -B 1000000 2>/dev/null | sort -h && df -h -B 1000000| awk '{print $4} NR >= 3 {exit}'"
}
In this case you can use cut instead of awk. And you'll have the same effect.
alias wtch="watch -n 0 -t 'du -sch * -B 1000000 2>/dev/null | sort -h && df -h -B 1000000| head -2 | cut -d\ -f4'"
Explaining cut:
-d option defines a delimiter
-d\ means that my delimiter is space
-f selects a column
-f4 gives you the fourth column
Ok so I have the following script that updates Route43 DNS entries. Unfortunately there is a limit to the number of calls per second you can make so I need to make the final Xargs command sleep for about a second between each iteration.
I've tried a couple of things like ' {../cli53 blah; sleep 10; } ' and I cant seem to get it to work. Does anyone have any suggestions please:
#!/bin/bash
set root='dirname $0'
ec2-describe-instances -O ******* -W ******* --region eu-west-1 |
perl -ne '/^INSTANCE\s+(i-\S+).*?(\S+\.amazonaws\.com)/
and do { $dns = $2; print "$1 $dns\n" }; /^TAG.+\sName\s+(\S+)/
and print "$1 $dns\n"' |
perl -ane 'print "$F[0] CNAME $F[1] --replace\n"' |
grep -v 'i-' | xargs --verbose -n 4 /usr/local/bin/cli53 rrcreate -x 5 contoso.com
Edit: Thanks Etan for the Answer. Here is my solution for anyone else that needs it:
I had to include the -I %variable% switch into the xargs statement aswel to make sure that the feed in was passed as parameters to cli53 but it all looks to be working nicely now.
#!/bin/bash
set root='dirname $0'
ec2-describe-instances -O ******* -W ******* --region eu-west-1 |
perl -ne '/^INSTANCE\s+(i-\S+).*?(\S+\.amazonaws\.com)/
and do { $dns = $2; print "$1 $dns\n" }; /^TAG.+\sName\s+(\S+)/
and print "$1 $dns\n"' |
perl -ane 'print "$F[0] CNAME $F[1] --replace\n"' |
grep -v '^i-' |
xargs --verbose -n 4 -I myvar /bin/sh -c '{ /usr/local/bin/cli53 rrcreate -x 5 contoso.com 'myvar'; sleep 1; printf "\n\n"; }'
The simplest solution would be to simply put the cli53 and sleep calls in a script and use xargs to execute the script.
If you don't want to do that you should be able to do what you were trying to do with this:
... | xargs ... /bin/sh -c '{ /usr/local/bin/cli53 ... "$#"; sleep 10; }' -