Error in Echo of awk command piped to wc -l - bash

I have hundreds of files containing lines similar to this:
>34764998 Halalkalicoccus_jeotgali_B3 -132.6938 Halalkalicoccus 0.528 Halobacteriaceae 0.638 Halobacteriales 0.648 Halobacteria 0.706 Euryarchaeota 0.850
I am interested in counting the number of items in column 5 that is less than 0.1, ...0.95. I have written a bash script that calls an AWK command to look evaluate the column value then pipe it into wc -l (see below). However, I don't quite have my $, ', and brackets arranged correctly. Can anyone advise me as to what I did incorrectly? This is probably not the most efficient way so I welcome suggestions, but I do want to know what I did wrong with the code I listed.
for fileName in 4440319.3_genus.txt 4440372.3_genus.txt 4440373.3_genus.txt 4440378.3_genus.txt 4440379.3_genus.txt 4440380.3_genus.txt 4440381.3_genus.txt
do
echo $fileName
for number in 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95
do
#NUM={awk '$5 < '$number' {print $5}' $filename | wc -l}
NUM={awk '$5 < $number {print $5}' $filename | wc -l}
#NUM=${awk '$5 < '$number' {print $5}' $filename | wc -l}
#NUM=${awk '$5 < $number {print $5}' $filename | wc -l}
echo $NUM
done
done
exit 0
All variations yield invalid option errors depending on which line is un-commented.
Thank you very much.

you don't need the wc -l pipe, even don't need the for loop of filename, try this:
awk -v n=0.95 '$5<n{a++}END{print a}' *_genus.txt

Assuming that you're using sh or bash, here's what I'd do:
NUM=`awk -v x=$number '$5 < x {print $5}' $fileName | wc -l`
Some explanation why this works and your attempts do not work:
You need to execute the pipe and store its output in variable NUM. That's why you need the backquotes around the pipe.
Your $number is a shell variable. Shell variable expansion does not take place inside single quotes, so your $number in the awk script has no chance of being substituted with the numbers that you want. To deal with this, you can either use double quotes to embed the number in the right place (this will cause some trouble because of the other dollar signs in the awk script that you don't want to be shell expanded), or you can use an awk variable that is externally initialized. That's what the -v argument does.
Last but not least, you need to fix the lowercase 'N' in filename.

Here I give the full script:
for fileName in 4440319.3_genus.txt 4440372.3_genus.txt 4440373.3_genus.txt 4440378.3_genus.txt 4440379.3_genus.txt 4440380.3_genus.txt 4440381.3_genus.txt
do
echo $fileName
for number in 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95
do
NUM={awk -v n=$number '$5<n{a++}END{print a}'}
echo "$NUM records is less than $number"
done
done
exit 0

Related

Shell Script Showing syntax error for dollar $

#!/bin/bash
LIMIT='50'
DIR="( $(df -Ph | column -t | awk '{print $5}' | grep -v Use) )"
for i in $DIR;
do
USED=$(df -Ph $i | awk '{print $5}' | sed -ne 2p | cut -d"%" -f1)
if [ "$USED" -gt "$LIMIT" ];
#If used space is bigger than LIMIT
then
#####
fi
done
Why am I getting syntax error at line 5 ? in for loop for variable $DIR?
I think the fundamental error is the quotes in the assignment of the array. Instead of DIR="( $(...) )", you need to drop the quotes and use DIR=( $(...) ). However, that assignment isn't necessary at all!
You probably shouldn't parse df like this, but you definitely should not be running df multiple time. There's no need for the embedded loop. Since you haven't really should what you're doing in the cases when the filesystem use is over the limit, it's hard to give better code, but whatever you're doing there can almost certainly be done easily in awk. Or, if not in awk, you can use awk rather than the embedded loop to trigger the action. eg, you could just do:
df -Ph | awk 'NR>1{printf("%s: %s than limit\n", $1, $5 + 0> limit ? "bigger" : "smaller")}' limit="${LIMIT-50}"
altough it probably makes more sense to do:
df -Ph | awk 'NR>1 && $5 + 0 > limit {print $1 " is over limit" > "/dev/stderr"}' limit="${LIMIT-50}"
note that both of these fail horribly if any of the column of the output of df contain whitespace (eg "map auto_home"). The output of df is intended for human consumption, and is not really suited to this sort of thing. You could do a column count in the awk (or use $(NF-1) instead of $5) and get the Capacity that way, but that's just moving the fragility.

Converting string to floating point number without bc in bash shell script

I'm getting load average in a bash shell script like so
load=`echo $(cat /proc/loadavg | awk '{print $1}')`
I know piping to bc
load=`echo $(cat /proc/loadavg | awk '{print $1}') \> 3 | bc -l`
is used in almost all examples of how to cast $load as an int but this box does not have bc installed and I am not allowed to add it.
I tried
int=`perl -E "say $load - 0"`
I tried
int=${load%.*}
I tried
int=`printf -v int %.0f "$load"`
What I want to be able to do is
if [ "$int" -gt 3.5 ]; then
How do I get that to evaluate as intended?
You can use awk to produce a success/failure depending on the condition:
# exit 0 (success) when load average greater than 3.5, so take the branch
if awk '{ exit !($1 > 3.5) }' /proc/loadavg; then
# load average was greater than 3.5
fi
Unfortunately, since "success" is 0 in the shell, you have to invert the logic of the condition to make awk exit with the required status. Obviously, you can do this in a number of ways, such as changing > to <=.
You don't need any external tools (like awk) to read this stuff. Load average from /proc/loadavg is always formatted with two decimal places, so you can do this:
read load _ < /proc/loadavg
if [ ${load/./} -gt 350 ]; then
# do something
fi

integer expression expected [bash does not understand .]

I made a small script to kill PID's if they exceed expected cpu usage. It works, but there is a small problem.
Script:
while [ 1 ];
do
cpuUse=$(ps -eo %cpu | sort -nr | head -1)
cpuMax=80
PID=$(ps -eo %cpu,pid | sort -nr | head -1 | cut -c 6-20)
if [ $cpuUse -gt $cpuMax ] ; then
kill -9 "$PID"
echo Killed PID $PID at the usage of $cpuUse out of $cpuMax
fi
exit 0
sleep 1;
done
It works if the integer is three digits long but fails if it drops to two and displays this:
./kill.sh: line 7: [: 51.3: integer expression expected
My question here is, how do I make bash understand the divider so it can kill processes under three digits.
You are probably getting leading space in that variable. Try piping with tr to strip all spaces first:
cpuUse=$(ps -eo %cpu | sort -nr | head -1 | tr -d '[[:space:]]')
Remove text after dot from cpuUse variable:
cpuUse="${cpuUse%%.*}"
Also better to use quotes in if condition:
if [ "$cpuUse" -gt "$cpuMax" ] ; then
OR better use arithmetic operator (( and )):
if (( cpuUse > cpuMax )); then
As you see, bash doesn't grok non-integer numbers. You need to eliminate the decimal point and the following digits from $cpuUse before doing the comparison":
cpuUse=$(sed 's/\..*/' <<<$cpuUse)
However, this is really a job for awk. It will simplify much of what you're doing. Whenever you find yourself with greps of greps, or head and then cuts, you should be dealing with awk. Awk can easily combine these multiple piped seds, greps, cuts, heads, into a single command.
By the way, the correct ps command is:
$ ps -eocpu="",pid=""
Using the ="" will eliminate the heading and simply give you the CPU and PID.
Looking at your program, there's no real need to sort. You're simply looking for all processes above that $cpuMax threshold:
ps -eo %cpu="",pid="" | awk '$1 > 80 {print $2}'
That prints out your PIDs which are over your threshold. Awk automatically loop through your entire input line-by-line. Awk also automatically divides each line into columns, and assigns each a variable from $1 and up. You can change the field divider with the -F parameter.
The above awk says look for all lines where the first column is above 80%, (the CPU usage) and print out the second column (the pid).
If you want some flexibility and be able to pass in different $cpuMax, you can use the -v parameter to set Awk variables:
ps -eo %cpu="",pid="" | awk -vcpuMax=$cpuMax '$1 > cpuMax {print $2}'
Now that you can pipe the output of this command into a while to delete all those processes:
pid=$(ps -eo %cpu="",pid="" | awk -vcpuMax=$cpuMax '$1 > cpuMax {print $2}')
if [[ -n $pid ]]
then
kill -9 $pid
echo "Killed the following processes:" $pid
fi

Bash escaping and syntax

I have a small bash file which I intend to use to determine my current ping vs my average ping.
#!/bin/bash
output=($(ping -qc 1 google.com | tail -n 1))
echo "`cut -d/ -f1 <<< "${output[3]}"`-20" | bc
This outputs my ping - 20 ms, which is the number I want. However, I also want to prepend a + if the number is positive and append "ms".
This brings me to my overarching problem: Bash syntax regarding escaping and such heavy "indenting" is kind of flaky.
While I'll be satisfied with an answer of how to do what I wanted, I'd like a link to, or explanation of how exactly bash syntax works dealing with this sort of thing.
output=($(ping -qc 1 google.com | tail -n 1))
echo "${output[3]}" | awk -F/ '{printf "%+fms\n", $1-20}'
The + modifier in printf tells it to print the sign, whether it's positive or negative.
And since we're using awk, there's no need to use cut or bc to get a field or do arithmetic.
Escaping is pretty awful in bash if you use the deprecated `..` style command expansion. In this case, you have to escape any backticks, which means you also have to escape any other escapes. $(..) nests a lot better, since it doesn't add another layer of escaping.
In any case, I'd just do it directly:
ping -qc 1 google.com.org | awk -F'[=/ ]+' '{n=$6}
END { v=(n-20); if(v>0) printf("+"); print v}'
Here's my take on it, recognizing that the result from bc can be treated as a string:
output=($(ping -qc 1 google.com | tail -n 1))
output=$(echo "`cut -d/ -f1 <<< "${output[3]}"`-20" | bc)' ms'
[[ "$output" != -* ]] && output="+$output"
echo "$output"
Bash cannot handle floating point numbers. A workaround is to use awk like this:
#!/bin/bash
output=($(ping -qc 1 google.com | tail -n 1))
echo "`cut -d/ -f1 <<< "${output[3]}"`-20" | bc | awk '{if ($1 >= 0) printf "+%fms\n", $1; else printf "%fms\n", $1}'
Note that this does not print anything if the result of bc is not positive
Output:
$ ./testping.sh
+18.209000ms

Reading numbers in scientific notation using bash

As part of an annotation pipeline for De Novo fish genomes I need to compare e-values from BLAST to see whether they are lower than a certain threshold.
To get the semantics right I first evaluated one of the othet columns in the blast-output, and it works fine like this:
for f in FOLDER/*; do
myVar=$(head -1 $f | awk '{print $4}') ;
if [[ $myVar -gt 50 ]]; then echo ..... ;done
$4 is then a column in the BLAST output with whole numerical values (hit length or something)
However, when I try to change the script to working with the e-values, there is some problems with interpretation of the scientific notation etc...
What I WOULD like is this:
for f in FOLDER/*; do
myVar=$(head -1 $f | awk '{print $11}') ;
if [[ $myVar -gt 1.0e-10 ]]; then echo ..... ;done
where $11 points to the e-value for each hit.
Could this be done in a not to cumbersome manner in bash?
With awk, it is possible:
for f in FOLDER/*; do awk '$11 < 1e-10 {print $11}' "$f"; done
This doesn't need the variable to be defined first.

Resources