How to extract specific data from grep command in bash?

How to extract specific data from grep command in bash? - bash

I'm trying to write my own script to tell me if I've used more than 500 MiB of my data.
I'm using vnstat -d for the information about data usage.
vnstat -d Output here
Output should be:
Only from the "Total column"
Only have values greater than 500.
I want only values from the "total"column. My output lists data from all the columns.
Better clear from the following:
#!/bin/bash
for i in `vnstat -d | grep -a [0-9] `; //get numerical values in i (-a tag as vnstat outputs in binary)
do
NUMBER=$(echo $i | grep -o '[5-9][0-9][0-9]'); //store values >500 in a var called NUMBER
echo $NUMBER;
done;
I'm a self-learning newb here so please try not to bash (pun) me.
Current output which I'm receiving from above script:
600
654
925
884
923
871
967
868
My desired output should be:
654
923
967

Simplified:
#/bin/bash
if [[ $(( $(vnstat -d --oneline|cut -d';' -f6|cut -d. -f1|paste -sd '+') )) -ge 500 ]];then
echo 500 Mb reached
fi
(What the script does, is it takes the specified field from the oneliner CSV-like output from each interface, then cuts the whole numbers and does a SUM of them. And then it compares if that sum is equal or greater than 500. And if it is, then it outputs a message)
Note:
-f6 will parse the "total for today" traffic
you can replace it with:
-f4 = rx for today
-f5 = tx for today

You want to parse a pipe delimited table and check only a specific column, there are tools more appropriate than grep for this job, for example you could write a small bash script where you use the cut command to extract the data and process them, or awk.
Here is a solution with awk. We print numbers > 500 of that column, total. Send your command output to
awk -F "|" '($3+0>=500){print $3}'
-F sets the field delimiter to |
$3+0 is used to convert a string starting with a number to that number, so that
we can handle it as a number and do the comparison.
Now, if you really want to extract all values having column total > 500 MiB,
then the expected output should include all values expressed in GiB, as they are
> 1000 MiB, for example the minimum value in your evil screenshot is 0.98 GiB which is 1003 MiB. So we can add this to the first condition.
awk -F "|" '($3 ~ /GiB/ || $3+0>500){print $3}'
Now if you want the output to be only integers in MiB, we can modify it to:
awk -F "|" '($3 ~ /GiB/){$3=1024*$3+0} ($3+0>500){printf "%.0f\n",$3}'
Here we convert all GiB values to MiB, and we do the comparison after that.

I'd use awk. Something like (untested)
vnstat -d | awk '$1 == "estimated" { exit }
($9 == "GiB" && $8 > 0.5) ||
($9 == "MiB" && $8 > 500) { print $8 " " $9 }'

#!/bin/bash
IFS=$'\n'
for i in `vnstat -d`; do # get each lines
VALUE=$(echo $i | cut -d\| -f3) # get total value with unit, in case you want to check for GiB values
NUMBER=$(echo $VALUE | grep -o '[0-9.*]' | cut -d. -f1); # split the string by '|', get the number part, store the integer part into NUMBER
if [[ $NUMBER -ge 500 && "$VALUE" == *"MiB"* || "$VALUE" == *"GiB"* ]]; then # if the number is greater than or equals to 500 OR it's in GiB
echo $VALUE; # echo the value
fi
done
Of course you can strip out the GiB checking if you wanted to.
Edit: Added IFS=$'\n' at the beginning. This allows the for loop to use endline as the delimiter.

vnstat has several options to format the output.
You can use vnstat --dumpdb, vnstat --json or vnstat --xml to have well-formatted data that you can then parse more easily (for example with jq if you choose the JSON format).
For example :
vnstat --json | jq '.interfaces[] | select(.id == "eth0") | .traffic | .days[1] | .rx'
will extract the number of kiB received on the interface eth0 yesterday (the day 0 is today, 1 is yesterday, etc)
To have the total rx+tx, you can use
vnstat --json | jq '.interfaces[] | select(.id == "eth0") | .traffic | .total | .rx+.tx'
You can also sum several days, for example today and yesterday :
vnstat --json | jq '.interfaces[] | select(.id == "eth0") | .traffic | [.days[0,1] | .rx+.tx] | add'
And instead of days, you can references "months" or "hours" (for hours, be careful, the id has not the same meaning, it's the reference of the hour).

Related

Integer expected error in script

I am trying to write a simple script to monitor disk usage. I keep getting integer expression expected errors at line 5. (THRESHOLD value is intentionally set low for testing.)
Here is my script
#!/bin/bash
CURRENT=$(df -hP | grep / | awk '{ print $5}' | sed 's/%//g')
THRESHOLD=10
if [ "$CURRENT" -gt "$THRESHOLD" ] ; then
mail -s 'Disk Space Alert' john.kenny#ngc.com << EOF
Your root partition remaining free space is critically low. Used: $CURRENT%
EOF
fi
My screen output looks like this
./monitor_disk_space.sh: line 5: [: 7
0
22
1
1
1
1
1
1: integer expression expected
I'm new to bash scripts and especially awk. Any suggestions would be appreciated.

As you can see you're getting a string of newline-separated values from your pipeline. This string is not in itself an integer, so it can't be compared to $THRESHOLD.
Assuming you'd like to send the message if any filesystem is above $THRESHOLD percent full, you may use
df -hP | awk '/\// { sub("%", "", $5); print $5 }' |
while read number; do
if [ "$number" -gt "$THRESHOLD" ]; then
mail ...
break
fi
done
This would pass the values, one by one, into a loop that would compare them against $THRESHOLD. If any value is larger, the mail is sent and the loop exits (via the break).
I also took the liberty of shortening your pipeline to just df+awk, as awk is more than capable of doing the work of both grep and sed.
If you only want to check the root partition, then use df -hP / in the pipeline above.

CURRENT=$(df -hP | grep / | awk '{ print $5}' | sed 's/%//g')
df -hp shows a summary of disk usage.
grep / filters out the header line.
awk '{print $5}' prints the 5th column, which is the percentage usage for each file system.
sed 's/%//g' deletes the % character. (There's only one, so the g is unnecessary. I might have used tr -d %, but it doesn't really matter.)
$(...) captures the output of the above -- which is going to be multiple lines of output, each of which should contain an integer.
The -gt operator requires a single integer for each of its arguments.
I think the problem is the grep /, which prints every line containing a / character (that's probably going to be everything except the header line). Your message indicates that you're interested in the root filesystem.
Changing grep / to grep /$ is one simple solution.
But passing / as an argument to the df command, so it displays usage only for the root file system, is even simpler.
Here's how I might do it:
CURRENT=$(df / | awk 'NR == 2 { print $5 }' | tr -d %)
You could incorporate the deletion of the % character into the awk command, but that would be a little more complicated.

why not do it all in awk?
$ df -hP |
awk -v th=10 '/\// {if($5+0>th)
system("echo Your ... " $5 " | mail -s \"Disk Space Alert\" xxx#example.com")}'

hdd script in bash - strange output

Trying to check the available hdd space via a script:
df -h :
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 18G 9.1G 8.7G 52% /
The commands are :
com=`df -h | awk '{print $5}' | grep % | grep -v Use | sort -n | tail -4 | cut -d % -f1`
echo $com
52 74 100 100
I want to isolate "52" for my checks ,so :
for i in ${com[#]};do
> echo ${com[0]:0:2}
> done
52
52
52
52
Ok, i managed to retrieve the correct number for my later checks ,but why the command returns the number "52" four times ??
Thanks a lot

You don't want to use a bash script for this trivial use-case. Also you are using the bunch of awk, grep commands to store output in a variable and not in an array's context.
You just need to use a simple Awk command,
df -h | awk 'NR==1{for(i=1;i<=NF;i++) if ($i == "Use%"){ ind=i; break}} NR==2 {n=split($0, val); used=val[ind]; sub(/%/,"",used); print used}'
The above command first looks up the column which has the Use% stored in the header line and then looks up the actual value in the same column in the next row.
To use the output in a variable store the output of command substitution as below
used_storage=$(df -h | awk 'NR==1{for(i=1;i<=NF;i++) if ($i == "Use%"){ ind=i; break}} NR==2 {n=split($0, val); used=val[ind]; sub(/%/,"",used); print used}')
echo "$used_storage"

how to find maximum and minimum values of a particular column using AWK [duplicate]

I'm using awk to deal with a simple .dat file, which contains several lines of data and each line has 4 columns separated by a single space.
I want to find the minimum and maximum of the first column.
The data file looks like this:
9 30 8.58939 167.759
9 38 1.3709 164.318
10 30 6.69505 169.529
10 31 7.05698 169.425
11 30 6.03872 169.095
11 31 5.5398 167.902
12 30 3.66257 168.689
12 31 9.6747 167.049
4 30 10.7602 169.611
4 31 8.25869 169.637
5 30 7.08504 170.212
5 31 11.5508 168.409
6 31 5.57599 168.903
6 32 6.37579 168.283
7 30 11.8416 168.538
7 31 -2.70843 167.116
8 30 47.1137 126.085
8 31 4.73017 169.496
The commands I used are as follows.
min=`awk 'BEGIN{a=1000}{if ($1<a) a=$1 fi} END{print a}' mydata.dat`
max=`awk 'BEGIN{a= 0}{if ($1>a) a=$1 fi} END{print a}' mydata.dat`
However, the output is min=10 and max=9.
(The similar commands can return me the right minimum and maximum of the second column.)
Could someone tell me where I was wrong? Thank you!

Awk guesses the type.
String "10" is less than string "4" because character "1" comes before "4".
Force a type conversion, using addition of zero:
min=`awk 'BEGIN{a=1000}{if ($1<0+a) a=$1} END{print a}' mydata.dat`
max=`awk 'BEGIN{a= 0}{if ($1>0+a) a=$1} END{print a}' mydata.dat`

a non-awk answer:
cut -d" " -f1 file |
sort -n |
tee >(echo "min=$(head -1)") \
> >(echo "max=$(tail -1)")
That tee command is perhaps a bit much too clever. tee duplicates its stdin stream to the files names as arguments, plus it streams the same data to stdout. I'm using process substitutions to filter the streams.
The same effect can be used (with less flourish) to extract the first and last lines of a stream of data:
cut -d" " -f1 file | sort -n | sed -n '1s/^/min=/p; $s/^/max=/p'
or
cut -d" " -f1 file | sort -n | {
read line
echo "min=$line"
while read line; do max=$line; done
echo "max=$max"
}

Your problem was simply that in your script you had:
if ($1<a) a=$1 fi
and that final fi is not part of awk syntax so it is treated as a variable so a=$1 fi is string concatenation and so you are TELLING awk that a contains a string, not a number and hence the string comparison instead of numeric in the $1<a.
More importantly in general, never start with some guessed value for max/min, just use the first value read as the seed. Here's the correct way to write the script:
$ cat tst.awk
BEGIN { min = max = "NaN" }
{
min = (NR==1 || $1<min ? $1 : min)
max = (NR==1 || $1>max ? $1 : max)
}
END { print min, max }
$ awk -f tst.awk file
4 12
$ awk -f tst.awk /dev/null
NaN NaN
$ a=( $( awk -f tst.awk file ) )
$ echo "${a[0]}"
4
$ echo "${a[1]}"
12
If you don't like NaN pick whatever you'd prefer to print when the input file is empty.

late but a shorter command and with more precision without initial assumption:
awk '(NR==1){Min=$1;Max=$1};(NR>=2){if(Min>$1) Min=$1;if(Max<$1) Max=$1} END {printf "The Min is %d ,Max is %d",Min,Max}' FileName.dat

A very straightforward solution (if it's not compulsory to use awk):
Find Min --> sort -n -r numbers.txt | tail -n1
Find Max --> sort -n -r numbers.txt | head -n1
You can use a combination of sort, head, tail to get the desired output as shown above.
(PS: In case if you want to extract the first column/any desired column you can use the cut command i.e. to extract the first column cut -d " " -f 1 sample.dat)

#minimum
cat your_data_file.dat | sort -nk3,3 | head -1
#this fill find minumum of column 3
#maximun
cat your_data_file.dat | sort -nk3,3 | tail -1
#this will find maximum of column 3
#to find in column 2 , use -nk2,2
#assing to a variable and use
min_col=`cat your_data_file.dat | sort -nk3,3 | head -1 | awk '{print $3}'`

Use 'df -h' to check % remaining disk space of a specific folder

I am using 'df -h' command to get disk space details in my directory and it gives me response as below :
Now I want to be able to do this check automatically through some batch or script - so I am wondering, if I will be able to check disk space only for specific folders which I care about, as shown in image - I am only supposed to check for /nas/home that it does not go above 75%.
How can I achieve this ? Any help ?
My work till now:
I am using
df -h > DiskData.txt
... this outputs to a text file
grep "/nas/home" "DiskData.txt"
... which gives me the output:
*500G 254G 247G 51% /nas/home*
Now I want to be able to search for the number previous or right nearby '%' sign (51 in this case) to achieve what I want.

This command will give you percentage of /nas/home directory
df /nas/home | awk '{ print $4 }' | tail -n 1| cut -d'%' -f1
So basically you can use store as value in some variable and then apply if else condition.
var=`df /nas/home | awk '{ print $4 }' | tail -n 1| cut -d'%' -f1`
if(var>75){
#send email
}

another variant:
df --output=pcent /nas/home | tail -n 1 | tr -d '[:space:]|%'
output=pcent - show only percent value (for coreutils => 8.21 )

A more concise way without extensive piping could be:
df -h /nas/home | perl -ane 'print substr $F[3],0,-1 if $.==2'
Returns: 51 for your example.

awk: find minimum and maximum in column

I'm using awk to deal with a simple .dat file, which contains several lines of data and each line has 4 columns separated by a single space.
I want to find the minimum and maximum of the first column.
The data file looks like this:
9 30 8.58939 167.759
9 38 1.3709 164.318
10 30 6.69505 169.529
10 31 7.05698 169.425
11 30 6.03872 169.095
11 31 5.5398 167.902
12 30 3.66257 168.689
12 31 9.6747 167.049
4 30 10.7602 169.611
4 31 8.25869 169.637
5 30 7.08504 170.212
5 31 11.5508 168.409
6 31 5.57599 168.903
6 32 6.37579 168.283
7 30 11.8416 168.538
7 31 -2.70843 167.116
8 30 47.1137 126.085
8 31 4.73017 169.496
The commands I used are as follows.
min=`awk 'BEGIN{a=1000}{if ($1<a) a=$1 fi} END{print a}' mydata.dat`
max=`awk 'BEGIN{a= 0}{if ($1>a) a=$1 fi} END{print a}' mydata.dat`
However, the output is min=10 and max=9.
(The similar commands can return me the right minimum and maximum of the second column.)
Could someone tell me where I was wrong? Thank you!

Awk guesses the type.
String "10" is less than string "4" because character "1" comes before "4".
Force a type conversion, using addition of zero:
min=`awk 'BEGIN{a=1000}{if ($1<0+a) a=$1} END{print a}' mydata.dat`
max=`awk 'BEGIN{a= 0}{if ($1>0+a) a=$1} END{print a}' mydata.dat`

a non-awk answer:
cut -d" " -f1 file |
sort -n |
tee >(echo "min=$(head -1)") \
> >(echo "max=$(tail -1)")
That tee command is perhaps a bit much too clever. tee duplicates its stdin stream to the files names as arguments, plus it streams the same data to stdout. I'm using process substitutions to filter the streams.
The same effect can be used (with less flourish) to extract the first and last lines of a stream of data:
cut -d" " -f1 file | sort -n | sed -n '1s/^/min=/p; $s/^/max=/p'
or
cut -d" " -f1 file | sort -n | {
read line
echo "min=$line"
while read line; do max=$line; done
echo "max=$max"
}

Your problem was simply that in your script you had:
if ($1<a) a=$1 fi
and that final fi is not part of awk syntax so it is treated as a variable so a=$1 fi is string concatenation and so you are TELLING awk that a contains a string, not a number and hence the string comparison instead of numeric in the $1<a.
More importantly in general, never start with some guessed value for max/min, just use the first value read as the seed. Here's the correct way to write the script:
$ cat tst.awk
BEGIN { min = max = "NaN" }
{
min = (NR==1 || $1<min ? $1 : min)
max = (NR==1 || $1>max ? $1 : max)
}
END { print min, max }
$ awk -f tst.awk file
4 12
$ awk -f tst.awk /dev/null
NaN NaN
$ a=( $( awk -f tst.awk file ) )
$ echo "${a[0]}"
4
$ echo "${a[1]}"
12
If you don't like NaN pick whatever you'd prefer to print when the input file is empty.

late but a shorter command and with more precision without initial assumption:
awk '(NR==1){Min=$1;Max=$1};(NR>=2){if(Min>$1) Min=$1;if(Max<$1) Max=$1} END {printf "The Min is %d ,Max is %d",Min,Max}' FileName.dat

A very straightforward solution (if it's not compulsory to use awk):
Find Min --> sort -n -r numbers.txt | tail -n1
Find Max --> sort -n -r numbers.txt | head -n1
You can use a combination of sort, head, tail to get the desired output as shown above.
(PS: In case if you want to extract the first column/any desired column you can use the cut command i.e. to extract the first column cut -d " " -f 1 sample.dat)

#minimum
cat your_data_file.dat | sort -nk3,3 | head -1
#this fill find minumum of column 3
#maximun
cat your_data_file.dat | sort -nk3,3 | tail -1
#this will find maximum of column 3
#to find in column 2 , use -nk2,2
#assing to a variable and use
min_col=`cat your_data_file.dat | sort -nk3,3 | head -1 | awk '{print $3}'`

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to extract specific data from grep command in bash? - bash

I'd use awk. Something like (untested) vnstat -d | awk '$1 == "estimated" { exit } ($9 == "GiB" && $8 > 0.5) || ($9 == "MiB" && $8 > 500) { print $8 " " $9 }'

Related

Integer expected error in script

hdd script in bash - strange output

how to find maximum and minimum values of a particular column using AWK [duplicate]

Use 'df -h' to check % remaining disk space of a specific folder

awk: find minimum and maximum in column

Categories

Resources