Conditional Sort using Awk or sort - bash

Alright, so I asked a question a week or so ago about how I could use sed or awk to extract a block of text between two blank lines, as well as omit part of the extracted text. The answers I got pretty much satisfied my needs, but now I'm doing something extra for fun (and for OCD's sake).
I want to sort the output from awk in this round. I found this question & answer but it doesn't quite help me to solve the problem. I've also tried wrapping my head around a lot of awk documentation as well to try and figure out how I could do this, to no avail.
So here's the block of code in my script that does all the dirty work:
# This block of stuff fetches the nameservers as reported by the registrar and DNS zone
# Then it gets piped into awk to work some more formatting magic...
# The following is a step-for-step description since I can't put comments inside the awk block:
# BEGIN:
# Set the record separator to a blank line
# Set the input/output field separators to newlines
# FNR == 3:
# The third block of dig's output is the nameservers reported by the registrar
# Also blanks the last field & strips it since it's just a useless dig comment
dig +trace +additional $host | \
awk -v host="$host" '
BEGIN {
RS = "";
FS = "\n"
}
FNR == 3 {
print "Nameservers of",host,"reported by the registrar:";
OFS = "\n";
$NF = ""; sub( /[[:space:]]+$/, "" );
print
}
'
And here's the output if I pass google.com in as the value of $host (other hostnames may produce output of differing line counts):
Nameservers of google.com reported by the registrar:
google.com. 172800 IN NS ns2.google.com.
google.com. 172800 IN NS ns1.google.com.
google.com. 172800 IN NS ns3.google.com.
google.com. 172800 IN NS ns4.google.com.
ns2.google.com. 172800 IN A 216.239.34.10
ns1.google.com. 172800 IN A 216.239.32.10
ns3.google.com. 172800 IN A 216.239.36.10
ns4.google.com. 172800 IN A 216.239.38.10
The idea is, using either the existing block of awk, or piping awk's output into a combination of more awk, sort, or whatever else, sort that block of text using a conditional algorithm:
if ( column 4 == 'NS' )
sort by column 5
else // This will ensure that the col 1 sort includes A and AAAA records
sort by column 1
I've pretty much got the same preferences for answers as the previous question:
Most important of all, it must be portable since I've encountered different behaviour between OS X (my home system) and Fedora (what I use at work) when using sed (had to replace it with gsed on OS X) and grep's -m flag (used in another script)
An explanation of how the solution works would be very much appreciated, as a learning opportunity moreso than anything else. I already learned quite a bit from the awk solution already provided in the previous question.
If the solution can be implemented within the same block of awk, that would also be awesome
If not, then something simple and eloquent that I can pipe awk's output through would suffice

Here's a solution based on #shellter's idea. Pipe the output of your nameserver records to this:
awk '$4 == "NS" {print $1, $5, $0} $4 == "A" {print $1, $1, $0}' | sort | cut -f3- -d' '
Explanation:
With awk, we take only the NS and A records, and re-print the same line with prefix: primary search column + secondary search column
sort will sort the lines, thanks to the way we set the first and second column, the order should be as you wanted
With cut we get rid of the prefix that we used for sorting

I know you asked about awk solution, but since you tagged it with bash too, I thought I'd provide such a version. It should also be more portable than awk ;)
# the whole line
declare -a lines
# the key to use for sorting
declare -a keys
# insert into the arrays at the appropriate position
function insert
{
local key="$1"
local line="$2"
local count=${#lines[*]}
local i
# go from the end backwards
for((i=count; i>0; i-=1))
do
# if we have the insertion point, break
[[ "${keys[i-1]}" > "$key" ]] || break
# shift the current item to make room for the new one
lines[i]=${lines[i-1]}
keys[i]=${keys[i-1]}
done
# insert the new item
lines[i]=$line
keys[i]=$key
}
# This block of stuff fetches the nameservers as reported by the registrar and DNS zone
# The third block of dig's output is the nameservers reported by the registrar
# Also blanks the last field & strips it since it's just a useless dig comment
block=0
dig +trace +additional $host |
while read f1 f2 f3 f4 f5
do
# empty line begins new block
if [ -z "$f1" ]
then
# increment block counter
block=$((block+1))
# and read next line
continue
fi
# if we are not in block #3, read next line
[[ $block == 3 ]] || continue
# ;; ends the block
if [[ "$f1" == ";;" ]]
then
echo "Nameservers of $host reported by the registrar:"
# print the lines collected so far
for((i=0; i<${#lines[*]}; i+=1))
do
echo ${lines[i]}
done
# don't bother reading the rest
break
fi
# figure out what key to use for sorting
if [[ "$f4" == "NS" ]]
then
key=$f5
else
key=$f1
fi
# add the line to the arrays
insert "$key" "$f1 $f2 $f3 $f4 $f5"
done

Related

Evaluate expression using printf in bash [duplicate]

How to do arithmetic with floating point numbers such as 1.503923 in a shell script? The floating point numbers are pulled from a file as a string. The format of the file is as follows:
1.5493482,3.49384,33.284732,23.043852,2.2384...
3.384,3.282342,23.043852,2.23284,8.39283...
.
.
.
Here is some simplified sample code I need to get working. Everything works fine up to the arithmetic. I pull a line from the file, then pull multiple values from that line. I think this would cut down on search processing time as these files are huge.
# set vars, loops etc.
while [ $line_no -gt 0 ]
do
line_string=`sed -n $line_no'p' $file_path` # Pull Line (str) from a file
string1=${line_string:9:6} # Pull value from the Line
string2=${line_string:16:6}
string3=...
.
.
.
calc1= `expr $string2 - $string7` |bc -l # I tried these and various
calc2= ` "$string3" * "$string2" ` |bc -l # other combinations
calc3= `expr $string2 - $string1`
calc4= "$string2 + $string8" |bc
.
.
.
generic_function_call # Use the variables in functions
line_no=`expr $line_no - 1` # Counter--
done
Output I keep getting:
expr: non-numeric argument
command not found
I believe you should use : bc
For example:
echo "scale = 10; 123.456789/345.345345" | bc
(It's the unix way: each tool specializes to do well what they are supposed to do, and they all work together to do great things. don't emulate a great tool with another, make them work together.)
Output:
.3574879198
Or with a scale of 1 instead of 10:
echo "scale = 1; 123.456789/345.345345" | bc
Output:
.3
Note that this does not perform rounding.
I highly recommand switching to awk if you need to do more complex operations, or perl for the most complex ones.
ex: your operations done with awk:
# create the test file:
printf '1.5493482,3.49384,33.284732,23.043852,2.2384,12.1,13.4,...\n' > somefile
printf '3.384,3.282342,23.043852,2.23284,8.39283,14.1,15.2,...\n' >> somefile
# do OP's calculations (and DEBUG print them out!)
awk -F',' '
# put no single quote in here... even in comments! you can instead print a: \047
# the -F tell awk to use "," as a separator. Thus awk will automatically split lines for us using it.
# $1=before first "," $2=between 1st and 2nd "," ... etc.
function some_awk_function_here_if_you_want() { # optionnal function definition
# some actions here. you can even have arguments to the function, etc.
print "DEBUG: no action defined in some_awk_function_here_if_you_want yet ..."
}
BEGIN { rem="Optionnal START section. here you can put initialisations, that happens before the FIRST file-s FIRST line is read"
}
(NF>=8) { rem="for each line with at least 8 values separated by commas (and only for lines meeting that condition)"
calc1=($2 - $7)
calc2=($3 * $2)
calc3=($2 - $1)
calc4=($2 + $8)
# uncomment to call this function :(ex1): # some_awk_function_here_if_you_want
# uncomment to call this script:(ex2): # cmd="/path/to/some/script.sh \"" calc1 "\" \"" calc2 "\" ..." ; rem="continued next line"
# uncomment to call this script:(ex2): # system(cmd); close(cmd)
line_no=(FNR-1) # ? why -1? . FNR=line number in the CURRENT file. NR=line number since the beginning (NR>FNR after the first file ...)
print "DEBUG: calc1=" calc1 " , calc2=" calc2 " , calc3=" calc3 " , calc4=" calc4 " , line_no=" line_no
print "DEBUG fancier_exemples: see man printf for lots of info on formatting (%...f for floats, %...d for integer, %...s for strings, etc)"
printf("DEBUG: calc1=%d , calc2=%10.2f , calc3=%s , calc4=%d , line_no=%d\n",calc1, calc2, calc3, calc4, line_no)
}
END { rem="Optionnal END section. here you can put things that need to happen AFTER the LAST file-s LAST line is read"
}
' somefile # end of the awk script, and the list of file(s) to be read by it.
What about this?
calc=$(echo "$String2 + $String8"|bc)
This will make bc to add the values of $String2 and $String8 and saves the result in the variable calc.
If you don't have the "bc" you can just use 'awk' :
calc=$(echo 2.3 4.6 | awk '{ printf "%f", $1 + $2 }')
scale in bc is the precission so with a scale of 4 if you type bc <<< 'scale=4;22.0/7' you get 3.1428 as an answer. If you use a scale of 8 you get 3.14285714 which is 8 numbers after the floating point.
So the scale is a precission factor

What's the most elegant way to add Redis to /etc/services?

So today, April 23rd 2015, the Internet Assigned Numbers Authority had decreed the use of port 6379 to Redis, a frabjous day indeed!
I wish to com·mem·o·rate this splendid occasion by adding the following line to my /etc/services file:
redis 6379/tcp
What would be the best way to go about it? By best I mean, of course, the following:
Needless to say, the new line should be inserted in its proper place (i.e.g. under the Assigned Numbers block, right after gnutella-rtr 6347/udp on my system)
I've considered the use of various text editors, but it feels out of place
Ideally, the solution should be a copy-pastable one-liner
I can envision the awk script that could do that but I'm looking for something more, a certain je ne sais quoi
Update re #Markus' sed proposal: I'm afraid the problem would be applying this "patch" on other systems that do not necessarily have the same /etc/services file so, expanding on point #1 above, the solution must ensure that regardless the specifics of to-be-preceding service in the file, order is kept.
Update 2: a few points that seem important to state - a) while not mandatory, the solution's length (or lack of rather) is certainly an important part of its elegance (similarly for external dependencies [i.e. lack of these]); b) I/we assumed that /etc/services is sorted, but it would be interesting to see what happens when it isn't; c) assume that you have root privileges and be careful with that rm / -rf command.
A single line sort that puts it in the right place:
echo -e "redis\t\t6379/tcp" | sort -k2 -n -o /etc/services -m - /etc/services
Nothing in the rules against answering my own question - this one uses Redis exclusively:
cat /etc/services | redis-cli -x SET services; redis-cli --raw EVAL 'local s = redis.call("GET", KEYS[1]) local b, p, name, port, proto; p = 1 repeat b, p, name, port, proto = string.find(s, "([%a%w\-]*)%s*(%d+)/(%w+)", p) if (p and tonumber(port) > tonumber(ARGV[2])) then s = string.sub(s, 1, b-1) .. ARGV[1] .. "\t\t" .. ARGV[2] .. "/" .. ARGV[3] .. "\t\t\t# " .. ARGV[4] .. "\n" .. string.sub(s, b, string.len(s)) return s end until not(p)' 1 services redis 6379 tcp "remote dictionary server" > /etc/services
Formatted Lua code:
local s = redis.call("GET", KEYS[1])
local b, p, name, port, proto
p = 1
repeat
b, p, name, port, proto = string.find(s, "([%a%w\-]*)%s*(%d+)/(%w+)", p)
if (p and tonumber(port) > tonumber(ARGV[2])) then
s = string.sub(s, 1, b-1) .. ARGV[1] .. "\t\t" .. ARGV[2]
.. "/" .. ARGV[3] .. "\t\t\t# " .. ARGV[4] .. "\n"
.. string.sub(s, b, string.len(s))
return s
end
until not(p)
Note: a similar challenge (https://gist.github.com/jorinvo/2e43ffa981a97bc17259#gistcomment-1440996) had inspired this answer. I chose a pure Lua script approach instead of leveraging Sorted Sets... although I could :)
An idempotent awk one-liner that inserts 6379 in order would be:
awk -v inserted=0 '/^[a-z]/ { if ($2 + 0 == 6379) { inserted=1 }; if (inserted == 0 && $2 + 0 > 6379) { print "redis\t\t6379/tcp"; inserted=1 }; print $0 }' /etc/services > /tmp/services && mv /tmp/services /etc/services
Real men don't use sed/awk :)
TMP_SERVICES=/tmp/services.$RANDOM
while read line
do
printf %b "$line\n" >> $TMP_SERVICES
if [[ $line == *"6347/udp"* ]]; then
printf %b "redis\t\t6379/tcp\n" >> $TMP_SERVICES
fi
done<"/etc/services"
mv -fb $TMP_SERVICES /etc/services
How about a python script, Itamar?
It works on the notion of extracting the port number (called the index in the code) and if we are above 6378 but have not yet printed our Redis line, print it, then mark that sentinel true and just print all lines (including the one we are on) after.
#!/usr/bin/python
lines = open("/etc/services").readlines()
printed=False
for line in lines:
if printed:
print line.rstrip()
continue
datafields = line.split()
if line[0] == "#":
print line.rstrip()
else:
datafields = line.split()
try:
try:
index,proto = datafields[1].split("/")
index = int(index)
except:
index,proto = datafields[0].split("/")
index = int(index)
if index > 6378:
if not printed:
print "redis 6379/tcp #Redis DSS"
printed = True
print line.rstrip()
except:
print datafields
raise
The relevant section on my file for comparison:
gnutella-rtr 6347/tcp # gnutella-rtr
# Serguei Osokine <osokin#paragraph.com>
# 6348-6381 Unassigned
redis 6379/tcp #Redis DSS
metatude-mds 6382/udp # Metatude Dialogue Server
metatude-mds 6382/tcp # Metatude Dialogue Server
Notice the line above Redis is a range. Short of breaking the range this is a workable solution for me. You could break the range but IMO this works just fine. Splitting the range seems a bit much for a simple, elegant script. Especially considering the likelihood most services files don't have the unassigned ranges listed (this is on OS X) - and that they are in a comment anyway.
UPDATE
If you don't care about the local file and it's comments, this gets you all currently assigned ports which are not Reserved or Discard-ed:
curl -s http://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.csv| awk -F',' '$4!~/(Discard|Unassigned|Reserved)/ && $1 && $2+0>0 && $1!~/FIX/ {printf "%-16s\t%s/%s\t#%s\n", $1,$2,$3,$4}' > /etc/services
The FIX test is because some of those lines have embedded newlines - which can be a pain in awk.
sed -i '/6347\/udp/a redis 6379\/tcp' /etc/services
good luck!
update
sed -i '/6347\/udp/a redis \t\t 6379\/tcp' /etc/services
looks better ...
update 2
lol
sed -i ''"$(echo $(echo $(grep -n $(awk {'print$2'} /etc/services | awk -F "/" '$1<6379'{'print$1'} | tail -1) /etc/services | awk -F ':' {'print$1'}|tail -1) + 1)|bc)"'i redis \t\t 6379\/tcp' /etc/services
I like the in-place subsitution of sed, so I combine it with an awk search of the next line in services
R="redis\t\t6379/tcp\t\t\t# data structure server" N=$(awk '{if($2+0 == 6379) exit(1);if ($2+0 > 6379) {print $0;exit(0)}}' /etc/services) && sed -i "s~$N~$R\n$N~" /etc/services

syntax error: operand expected (error token is ">= 75 ")

#!/bin/bash
CURRENT=$(df -h / | grep / | awk '{ print $4}')
THRESHOLD=75
if (( "$CURRENT" >= "$THRESHOLD" )); then
mail -s "CENTOS-6 localhost 10.10.1.238 Disk Space Alert" sss#abc.net << EOF
Your root partition remaining free space is critically low. Used: $CURRENT%
EOF
fi
I got the following error when i run the script, syntax error: operand expected (error token is ">= 75 ")
It's because CURRENT will contain a percent sign, so it won't be a valid operand for the comparison operation.
You can remove the last character like this :
CURRENT=${CURRENT%?};
Also make sure that df -h / | grep / | awk '{ print $4}' is correctly returning the usage ratio, on most systems you have to use print $5.
A couple of things:
you don't need grep at all, awk is quite capable of doing it's own regex stuff.
if you search for / in the df output, you'll probably get most lines as most mounts have a / somewhere in them. If you just want the root mountpoint, you can use <space>/$.
Check that 4 is the correct field number, on my box it's 5.
In any case, that field is of the form 55% which will not be considered numeric. You can use gsub to get rid of it.
With that in mind, the following snippet can be used to get the percentage:
df -h | awk '$0 ~ / \/$/ { gsub("%","",$5); print $5 }'
And, just as an aside, I'm not that big a fan of here-docs in shell scripts since it either (1) screws up my nicely indented files; or (2) makes me burn half an hour while I try to remember the various syntax options which will allow indented EOF strings :-)
I prefer something like:
(
echo Your root partition remaining free space is critically low: Used: ${CURRENT}%
) | mail -s "CENTOS-6 localhost 10.10.1.238 Disk Space Alert" sss#abc.net
Especially since that means I can put arbitrarily complex commands in the sub-shell to generate whatever info I want in the mail message (rather than just simple text substitutions).
So, bottom line, I'd be looking at something more like:
#!/usr/bin/env bash
# Config section.
LIMIT=75
# Code section.
CURR=$(df -h | awk '$0 ~ / \/$/ { gsub("%","",$5); print $5 }')
if [[ ${CURR} -ge ${LIMIT} ]] ; then
(
echo "Your root partition remaining free space is critically low: Used: ${CURR}%"
) | mail -s "CENTOS-6 localhost 10.10.1.238 Disk Space Alert" sss#abc.net
fi
Just try:
CURRENT=$(df -h |awk '{print $4}' |sort -n |tail -n1 |sed 's/%//g')
THRESHOLD=90
if [ $THRESHOLD -gt $CURRENT ]

Using awk with Operations on Variables

I'm trying to write a Bash script that reads files with several columns of data and multiplies each value in the second column by each value in the third column, adding the results of all those multiplications together.
For example if the file looked like this:
Column 1 Column 2 Column 3 Column 4
genome 1 30 500
genome 2 27 500
genome 3 83 500
...
The script should multiply 1*30 to give 30, then 2*27 to give 54 (and add that to 30), then 3*83 to give 249 (and add that to 84) etc..
I've been trying to use awk to parse the input file but am unsure of how to get the operation to proceed line by line. Right now it stops after the first line is read and the operations on the variables are performed.
Here's what I've written so far:
for file in fileone filetwo
do
set -- $(awk '/genome/ {print $2,$3}' $file.hist)
var1=$1
var2=$2
var3=$((var1*var2))
total=$((total+var3))
echo var1 \= $var1
echo var2 \= $var2
echo var3 \= $var3
echo total \= $total
done
I tried placing a "while read" loop around everything but could not get the variables to update with each line. I think I'm going about this the wrong way!
I'm very new to Linux and Bash scripting so any help would be greatly appreciated!
That's because awk reads the entire file and runs its program on each line. So the output you get from awk '/genome/ {print $2,$3}' $file.hist will look like
1 30
2 27
3 83
and so on, which means in the bash script, the set command makes the following variable assignments:
$1 = 1
$2 = 30
$3 = 2
$4 = 27
$5 = 3
$6 = 83
etc. But you only use $1 and $2 in your script, meaning that the rest of the file's contents - everything after the first line - is discarded.
Honestly, unless you're doing this just to learn how to use bash, I'd say just do it in awk. Since awk automatically runs over every line in the file, it'll be easy to multiply columns 2 and 3 and keep a running total.
awk '{ total += $2 * $3 } ENDFILE { print total; total = 0 }' fileone filetwo
Here ENDFILE is a special address that means "run this next block at the end of each file, not at each line."
If you are doing this for educational purposes, let me say this: the only thing you need to know about doing arithmetic in bash is that you should never do arithmetic in bash :-P Seriously though, when you want to manipulate numbers, bash is one of the least well-adapted tools for that job. But if you really want to know, I can edit this to include some information on how you could do this task primarily in bash.
I agree that awk is in general better suited for this kind of work, but if you are curious what a pure bash implementation would look like:
for f in file1 file2; do
total=0
while read -r _ x y _; do
((total += x * y))
done < "$f"
echo "$total"
done

Floating-point arithmetic in UNIX shell script

How to do arithmetic with floating point numbers such as 1.503923 in a shell script? The floating point numbers are pulled from a file as a string. The format of the file is as follows:
1.5493482,3.49384,33.284732,23.043852,2.2384...
3.384,3.282342,23.043852,2.23284,8.39283...
.
.
.
Here is some simplified sample code I need to get working. Everything works fine up to the arithmetic. I pull a line from the file, then pull multiple values from that line. I think this would cut down on search processing time as these files are huge.
# set vars, loops etc.
while [ $line_no -gt 0 ]
do
line_string=`sed -n $line_no'p' $file_path` # Pull Line (str) from a file
string1=${line_string:9:6} # Pull value from the Line
string2=${line_string:16:6}
string3=...
.
.
.
calc1= `expr $string2 - $string7` |bc -l # I tried these and various
calc2= ` "$string3" * "$string2" ` |bc -l # other combinations
calc3= `expr $string2 - $string1`
calc4= "$string2 + $string8" |bc
.
.
.
generic_function_call # Use the variables in functions
line_no=`expr $line_no - 1` # Counter--
done
Output I keep getting:
expr: non-numeric argument
command not found
I believe you should use : bc
For example:
echo "scale = 10; 123.456789/345.345345" | bc
(It's the unix way: each tool specializes to do well what they are supposed to do, and they all work together to do great things. don't emulate a great tool with another, make them work together.)
Output:
.3574879198
Or with a scale of 1 instead of 10:
echo "scale = 1; 123.456789/345.345345" | bc
Output:
.3
Note that this does not perform rounding.
I highly recommand switching to awk if you need to do more complex operations, or perl for the most complex ones.
ex: your operations done with awk:
# create the test file:
printf '1.5493482,3.49384,33.284732,23.043852,2.2384,12.1,13.4,...\n' > somefile
printf '3.384,3.282342,23.043852,2.23284,8.39283,14.1,15.2,...\n' >> somefile
# do OP's calculations (and DEBUG print them out!)
awk -F',' '
# put no single quote in here... even in comments! you can instead print a: \047
# the -F tell awk to use "," as a separator. Thus awk will automatically split lines for us using it.
# $1=before first "," $2=between 1st and 2nd "," ... etc.
function some_awk_function_here_if_you_want() { # optionnal function definition
# some actions here. you can even have arguments to the function, etc.
print "DEBUG: no action defined in some_awk_function_here_if_you_want yet ..."
}
BEGIN { rem="Optionnal START section. here you can put initialisations, that happens before the FIRST file-s FIRST line is read"
}
(NF>=8) { rem="for each line with at least 8 values separated by commas (and only for lines meeting that condition)"
calc1=($2 - $7)
calc2=($3 * $2)
calc3=($2 - $1)
calc4=($2 + $8)
# uncomment to call this function :(ex1): # some_awk_function_here_if_you_want
# uncomment to call this script:(ex2): # cmd="/path/to/some/script.sh \"" calc1 "\" \"" calc2 "\" ..." ; rem="continued next line"
# uncomment to call this script:(ex2): # system(cmd); close(cmd)
line_no=(FNR-1) # ? why -1? . FNR=line number in the CURRENT file. NR=line number since the beginning (NR>FNR after the first file ...)
print "DEBUG: calc1=" calc1 " , calc2=" calc2 " , calc3=" calc3 " , calc4=" calc4 " , line_no=" line_no
print "DEBUG fancier_exemples: see man printf for lots of info on formatting (%...f for floats, %...d for integer, %...s for strings, etc)"
printf("DEBUG: calc1=%d , calc2=%10.2f , calc3=%s , calc4=%d , line_no=%d\n",calc1, calc2, calc3, calc4, line_no)
}
END { rem="Optionnal END section. here you can put things that need to happen AFTER the LAST file-s LAST line is read"
}
' somefile # end of the awk script, and the list of file(s) to be read by it.
What about this?
calc=$(echo "$String2 + $String8"|bc)
This will make bc to add the values of $String2 and $String8 and saves the result in the variable calc.
If you don't have the "bc" you can just use 'awk' :
calc=$(echo 2.3 4.6 | awk '{ printf "%f", $1 + $2 }')
scale in bc is the precission so with a scale of 4 if you type bc <<< 'scale=4;22.0/7' you get 3.1428 as an answer. If you use a scale of 8 you get 3.14285714 which is 8 numbers after the floating point.
So the scale is a precission factor

Resources