bash - summing the output of wordcount

bash - summing the output of wordcount - bash

Scenario:
I have a bunch of VIPs. While doing an NSLOOKUP, the output generally returns an output with one public IP. In cases where the loadbalancer fails, the NSLOOKUP returns two public IPs. for such scenarios, I want to write a script.
Logic:
for i vip1 vip2 vip3; do nslookup $i | grep -v "<private IP> | grep 'Address:' | wc -l ; done
in an ideal scenario, the output will look like
1
1
1
If I could sum the output, it would say 3. If something goes wrong, the output will show a sum > 3. I was unable to sum in the above case. Please advice

echo vip1 vip2 vip3 | xargs -n 1 nslookup | \
awk '/Address/ && !/<private-ip>/ {s++} END{print s}'

Instead of summing, just count the matches in the entire loop. And use grep -c to do the match and count in one step.
for i in vip1 vip2 vip3
do
nslookup "$i"
done | grep -v "<private IP>" | grep -c 'Address'

Make it with awk:
user#host:~# cat blub | awk '{ SUM += $1} END { print SUM }'
4
blub is a file with contents:
user#host:~# cat blub
1
2
1

To get a sum you can substitute the command into arithmetic evaluation construct. If your pipeline produces an integer, then try using following loop:
for i in vip1 vip2 vip3; do
((sum += $(nslookup $i | .. rest of pipeline .. | wc -l)))
done
# .. do something with $sum ..
maybe not the most elegant look, but shall work

Related

count how many times a word appears in a specific cloumn bash

this is a file that i have named people.txt
10001:Larry Simpson:65:NewYork:555666777
10002:Jonh Fin:91:Rome:333444555
10003:George Jas:86:Amsterdam:777888999
10004:Larry Simpson:65:NewYork:555666777
10005:Jonh Fin:91:Rome:333444555
I was trying to count how many people there was on a specific city that is given as argument of the script.
First thing i thought was:
grep "$1:" people.txt | wc -l
The ":" was because we can have a city named Amster and another named Amsterdam.
But then I realized that we can have people named Amsterdam, so I tried this to search in cities column:
k=$(awk -F ":" -v loc=$1 -v max=0 ' {if ($4==loc) max++; print max}' people.txt)
echo $k
But now the output is like 0 0 1 1 1 and how can I have just the last digit of this output?
I also tried with cut but when doing -f we don´t know how long that output is.
Desired output is just
1
Regards

Assuming $1 is equal to "NewYork":
awk -F: -v loc="$1" '$4==loc { cnt++ } END { print cnt}' people.txt
You need to use the END block to print the final count.

You can just do it with a single grep command:
grep -Ec "^([^:]*:){3}$1:" people.txt

Get Average of Found Numbers in Each File to Two Decimal Places

I have a script that searches through all files in the directory and pulls the number next to the word <Overall>. I want to now get the average of the numbers from each file, and output the filename next to the average to two decimal places. I've gotten most of it to work except displaying the average. I should say I think it works, I'm not sure if it's pulling all of the instances in the file, and I'm definitely not sure if it's finding the average, it's hard to tell without the precision. I'm also sorting by the average at the end. I'm trying to use awk and bc to get the average, there's probably a better method.
What I have now:
path="/home/Downloads/scores/*"
(for i in $path
do
echo `basename $i .dat` `grep '<Overall>' < $i |
head -c 10 | tail -c 1 | awk '{total += $1} END {print total/NR}' | bc`
done) | sort -g -k 2
The output i get is:
John 4
Lucy 4
Matt 5
Sara 5
But it shouldn't be an integer and it should be to two decimal places.
Additionally, the files I'm searching through look like this:
<Student>John
<Math>2
<English>3
<Overall>5
<Student>Richard
<Math>2
<English>2
<Overall>4

In general, your script does not extract all numbers from each file, but only the first digit of the first number. Consider the following file:
<Overall>123 ...
<Overall>4 <Overall>56 ...
<Overall>7.89 ...
<Overall> 0 ...
The command grep '<Overall>' | head -c 10 | tail -c 1 will only extract 1.
To extract all numbers preceded by <Overall> you can use grep -Eo '<Overall> *[0-9.]*' | grep -o '[0-9.]*' or (depending on your version) grep -Po '<Overall>\s*\K[0-9.]*'.
To compute the average of these numbers you can use your awk command or specialized tools like ... | average (from the package num-utils) or ... | datamash mean 1.
To print numbers with two decimal places (that is 1.00 instead of 1 and 2.35 instead of 2.34567) you can use printf.
#! /bin/bash
path=/home/Downloads/scores/
for i in "$path"/*; do
avg=$(grep -Eo '<Overall> *[0-9.]*' "$file" | grep -o '[0-9.]*' |
awk '{total += $1} END {print total/NR}')
printf '%s %.2f\n' "$(basename "$i" .dat)" "$avg"
done |
sort -g -k 2
Sorting works only if file names are free of whitespace (like space, tab, newline).
Note that you can swap out the two lines after avg=$( with any method mentioned above.

You can use a sed command and retrieve the values to calculate their average with bc:
# Read the stdin, store the value in an array and perform a bc call
function avg() { mapfile -t l ; IFS=+ bc <<< "scale=2; (${l[*]})/${#l[#]}" ; }
# Browse the .dat files, then display for each file the average
find . -iname "*.dat" |
while read f
do
f=${f##*/} # Remove the dirname
# Echoes the file basename and a tabulation (no newline)
echo -en "${f%.dat}\t"
# Retrieves all the "Overall" values and passes them to our avg function
sed -E -e 's/<Overall>([0-9]+)/\1/' "$f" | avg
done
Output example:
score-2 1.33
score-3 1.33
score-4 1.66
score-5 .66

The pipeline head -c 10 | tail -c 1 | awk '{total += $1} END {print total/NR}' | bc needs improvement.
head -c 10 | tail -c 1 leaves only the 10th character of the first Overall line from each file; better drop that.
Instead, use awk to "remove" the prefix <Overall> and extract the number; we can do this by using <Overall> for the input field separator.
Also use awk to format the result to two decimal places.
Since awk did the job, there's no more need for bc; drop it.
The above pipeline becomes awk -F'<Overall>' '{total += $2} END {printf "%.2f\n", total/NR}'.
Don't miss to keep the ` after it.

How to remove all but the last 3 parts of FQDN?

I have a list of IP lookups and I wish to remove all but the last 3 parts, so:
98.254.237.114.broad.lyg.js.dynamic.163data.com.cn
would become
163data.com.cn
I have spent hours searching for clues, including parameter substitution, but the closest I got was:
$ string="98.254.237.114.broad.lyg.js.dynamic.163data.com.cn"
$ string1=${string%.*.*.*}
$ echo $string1
Which gives me the inverted answer of:
98.254.237.114.broad.lyg.js.dynamic
which is everything but the last 3 parts.
A script to do a list would be better than just the static example I have here.
Using CentOS 6, I don't mind if it by using sed, cut, awk, whatever.
Any help appreciated.
Thanks, now that I have working answers, may I ask as a follow up to then process the resulting list and if the last part (after last '.') is 3 characters - eg .com .net etc, then to just keep the last 2 parts.
If this is against protocol, please advise how to do a follow up question.

if parameter expansion inside another parameter expansion is supported, you can use this:
$ s='98.254.237.114.broad.lyg.js.dynamic.163data.com.cn'
$ # removing last three fields
$ echo "${s%.*.*.*}"
98.254.237.114.broad.lyg.js.dynamic
$ # pass output of ${s%.*.*.*} plus the extra . to be removed
$ echo "${s#${s%.*.*.*}.}"
163data.com.cn
can also reverse the line, get required fields and then reverse again.. this makes it easier to use change numbers
$ echo "$s" | rev | cut -d. -f1-3 | rev
163data.com.cn
$ echo "$s" | rev | cut -d. -f1-4 | rev
dynamic.163data.com.cn
$ # and easy to use with file input
$ cat ip.txt
98.254.237.114.broad.lyg.js.dynamic.163data.com.cn
foo.bar.123.baz.xyz
a.b.c.d.e.f
$ rev ip.txt | cut -d. -f1-3 | rev
163data.com.cn
123.baz.xyz
d.e.f

echo $string | awk -F. '{ if (NF == 2) { print $0 } else { print $(NF-2)"."$(NF-1)"."$NF } }'
NF signifies the total number of field separated by "." and so we want the last piece (NF), last but 1 (NF-1) and last but 2 (NF-2)

$ echo $string | awk -F'.' '{printf "%s.%s.%s\n",$(NF-2),$(NF-1),$NF}'
163data.com.cn
Brief explanation,
Set the field separator to .
Print only last 3 field using the awk parameter $(NF-2), $(NF-1),and $NF.
And there's also another option you may try,
$ echo $string | awk -v FPAT='[^.]+.[^.]+.[^.]+$' '{print $NF}'
163data.com.cn

It sounds like this is what you need:
awk -F'.' '{sub("([^.]+[.]){"NF-3"}","")}1'
e.g.
$ echo "$string" | awk -F'.' '{sub("([^.]+[.]){"NF-3"}","")}1'
163data.com.cn
but with just 1 sample input/output it's just a guess.
wrt your followup question, this might be what you're asking for:
$ echo "$string" | awk -F'.' '{n=(length($NF)==3?2:3); sub("([^.]+[.]){"NF-n"}","")}1'
163data.com.cn
$ echo 'www.google.com' | awk -F'.' '{n=(length($NF)==3?2:3); sub("([^.]+[.]){"NF-n"}","")}1'
google.com

Version which uses only bash:
echo $(expr "$string" : '.*\.\(.*\..*\..*\)')
To use it with a file you can iterate with xargs:
File:
head list.dat
98.254.237.114.broad.lyg.js.dynamic.163data.com.cn
98.254.34.56.broad.kkk.76onepi.co.cn
98.254.237.114.polst.a65dal.com.cn
iterating the whole file:
cat list.dat | xargs -I^ -L1 expr "^" : '.*\.\(.*\..*\..*\)'
Notice: it won't be very efficient in large scale, so you need to consider by your own whether it is good enough for you.
Regexp explanation:
.* \. \( .* \. .* \. .* \)
\___| | | | |
| \------------------------/> brakets shows which part we extract
| | |
| \-------/> the \. indicates the dots to separate specific number of words
|
|
-> the rest and the final dot which we are not interested in (out of brakets)
details:
http://tldp.org/LDP/abs/html/string-manipulation.html -> Substring Extraction

grep two integers and compare them

I am trying to write a script that does the following:
Given a string that look like this "There are 5 apples and 3 oranges"
Extract the two integers (5, 3)
Compare them
I got the extract part done.
NUM=echo $String | grep -o "[0-9]\+"
But NUM will be something like this:
5
3
\n
I tried ${NUM[0]} and ${NUM[#]} just to get the first value but it doesn't work.
Any suggestions?

I would do this with process substitution and mapfile:
$ mapfile -t nums < <(grep -Eo '[[:digit:]]+' <<< 'There are 5 apples and 3 oranges')
$ declare -p nums
declare -a nums='([0]="5" [1]="3")'
This makes sure that only newlines separate array elements. This wouldn't be a problem in this case as the search terms are sequences of numbers, but it's a robust approach that would work for any pattern.
Notice that mapfile requires Bash 4.0 or newer.

The way you assign to NUM is incorrect.
So is the grep pattern in your post.
Write like this:
input='There are 5 apples and 3 oranges'
nums=($(grep -Eo '[0-9]+' <<< "$input"))
${nums[0]} will contain the first number, ${nums[1]} the 2nd, and so on.
If the input comes from a command:
nums=($(cmd | grep -Eo '[0-9]+'))

With GNU awk for FPAT:
$ echo 'There are 5 apples and 3 oranges' |
awk -v FPAT='[0-9]+' '{print ($1 > $2 ? "greater" : "lesser")}'
greater
$ echo 'There are 2 apples and 3 oranges' |
awk -v FPAT='[0-9]+' '{print ($1 > $2 ? "greater" : "lesser")}'
lesser

with GNU awk:
gawk '{if($1>$2){print $1">"$2}else if($1<$2){print $1"<"$2} else {print $1"="$2}}' FPAT='[0-9]+' <<<'There are 5 apples and 8 oranges'
The value of FPAT should be a string that provides a regular
expression. This regular expression describes the contents of each
field.

print line number of output in shell script

I have a script that prints out the average time when pinging a server, shown below:
ping -c3 "${I}" | tail -1 | awk '{print $4}' | cut -d '/' -f 2 | sed 's/$/\tms/'
How can I add the line number to output of the script above when pinging a list of servers ??
my actual output when pinging list of 3 host is:
6.924 ms
100.099 ms
7.756 ms
I want the output to be like this:
1,6.924 ms
2,100.099 ms
3,7,756 ms
so that this can be read by excel :)
Thank in advanced!!

Pipe your output through perl:
echo -e 'aa\nbb' | perl -ne 'print $., ",", $_'
Output:
1,aa
2,bb

Is that what you want?
C=1
for I in 'host1' 'host2' 'host3'
do
ping -c3 "${I}" | tail -1 | awk '{print $4}' | cut -d '/' -f 2 | echo "$C,$(sed 's/$/\tms/')"
C=$((C+1))
done

The standard tool for line numbering is nl. Pipe your output to nl -s, That is:
for I; do
ping -c3 "${I}" | awk -F/ 'END{print $5, "\tms"}'
done | nl -s,
Since you haven't specified how the list is generated, I'm just showing the case where the list of hosts to be pinged is given on the command line. Note that this introduces leading whitespace before the line number, so you might want to filter that through sed to remove.
Of course, this script is spending most of its time waiting for the ping, and you probably want to speed it up by running the pings in parallel. In that case, it is better to add the line number at the beginning so you can get a stable sort in the output:
line=1
{ for I; do ping -c3 $I | awk -F/ 'END{
printf( "%d,%s\tms\n", line,$5 )}' line=$line &
: $((line +=1 ))
done; wait; } | sort -n
In this case, the wait is not necessary since sort will block until all of the pings have closed their output, but the wait becomes necessary if you add any processes in the pipeline before the sort that do not necessarily wait for all of their input before doing any processing, so it is a good practice to leave the wait in place.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

bash - summing the output of wordcount - bash

echo vip1 vip2 vip3 | xargs -n 1 nslookup | \ awk '/Address/ && !/<private-ip>/ {s++} END{print s}'

Instead of summing, just count the matches in the entire loop. And use grep -c to do the match and count in one step. for i in vip1 vip2 vip3 do nslookup "$i" done | grep -v "<private IP>" | grep -c 'Address'

Make it with awk: user#host:~# cat blub | awk '{ SUM += $1} END { print SUM }' 4 blub is a file with contents: user#host:~# cat blub 1 2 1

Related

count how many times a word appears in a specific cloumn bash

Get Average of Found Numbers in Each File to Two Decimal Places

How to remove all but the last 3 parts of FQDN?

grep two integers and compare them

print line number of output in shell script

Categories

Resources