awk modify printf output - bash

I need to parse the output of ldapsearch and only keep the attributes with numeric values.
Also I need to transform the output to make it usable in prometheus monitoring.
this is the output of a raw ldapsearch:
# 389, snmp, monitor
dn: cn=389,cn=snmp,cn=monitor
cn: 389
objectClass: top
objectClass: extensibleObject
anonymousbinds: 9
unauthbinds: 9
simpleauthbinds: 122256
strongauthbinds: 0
bindsecurityerrors: 27869
inops: 24501385
readops: 17933653
compareops: 24852
addentryops: 14205
removeentryops: 0
modifyentryops: 378287
modifyrdnops: 0
listops: 0
searchops: 19194674
onelevelsearchops: 117
wholesubtreesearchops: 1260904
referrals: 0
chainings: 0
securityerrors: 2343
errors: 4694375
connections: 1075
connectionseq: 4720927
bytesrecv: 1608469180
bytessent: -424079608
entriesreturned: 19299393
referralsreturned: 0
I execute this query in order to remove the fields that are not numerical and also the dn/cn fields if they have numbers eg cn=389.
${LDAPSEARCH} -LLL -H ${LDAP_URI} -x -D "${BINDDN}" -w ${LDAP_PASSWD} -b "${cn}" -s base | sed '/^cn\|^dn/d' | awk -F: '{ if ( $1 != "connection" && $2 ~ /[[:digit:]$]/) printf "dsee_%s\n", $1 $2}'
But i need to modify the print f so that it prints me the field like this:
dsee_modifyrdnops{node="vm1",cn="389"} 0
dsee_listops{node="vm1",cn="1389"} 0
dsee_strongauthbinds{node="vm1",cn="389"} 0
dsee_readops{"node="vm1",cn="389"} 37194588
I have difficulties adding the curly brackets and quotes to the printf command.
what would be the best way to improve the awk/sed command and modify the printf output?

In plain bash:
#!/bin/bash
node=vm1
while IFS=: read -r key val; do
[[ $key = cn ]] && { cn=${val# }; continue; }
if [[ $val =~ ^\ -?[0-9]+(\.[0-9]*)?$ ]]; then
printf 'dsee_%s{node="%s",cn="%s"}%s\n' "$key" "$node" "$cn" "$val"
fi
done < <( your_raw_ldapsearch_command )

something along these lines:
$ cat tst.awk
BEGIN {
FS=":[[:blank:]]*"
qq="\""
node="vm1"
}
$1=="cn" {cn=$2}
$1!~/^((cn|dn)$|connection)/ && $2~/^[[:digit:]]+$/ {
printf("dsee_%s{node=%s%s%s,cn=%s%s%s} %d\n", $1, qq, node, qq, qq, cn, qq, $2)
}
$ awk -f tst.awk myFile
dsee_anonymousbinds{node="vm1",cn="389"} 9
dsee_unauthbinds{node="vm1",cn="389"} 9
dsee_simpleauthbinds{node="vm1",cn="389"} 122256
dsee_strongauthbinds{node="vm1",cn="389"} 0
dsee_bindsecurityerrors{node="vm1",cn="389"} 27869
dsee_inops{node="vm1",cn="389"} 24501385
dsee_readops{node="vm1",cn="389"} 17933653
dsee_compareops{node="vm1",cn="389"} 24852
dsee_addentryops{node="vm1",cn="389"} 14205
dsee_removeentryops{node="vm1",cn="389"} 0
dsee_modifyentryops{node="vm1",cn="389"} 378287
dsee_modifyrdnops{node="vm1",cn="389"} 0
dsee_listops{node="vm1",cn="389"} 0
dsee_searchops{node="vm1",cn="389"} 19194674
dsee_onelevelsearchops{node="vm1",cn="389"} 117
dsee_wholesubtreesearchops{node="vm1",cn="389"} 1260904
dsee_referrals{node="vm1",cn="389"} 0
dsee_chainings{node="vm1",cn="389"} 0
dsee_securityerrors{node="vm1",cn="389"} 2343
dsee_errors{node="vm1",cn="389"} 4694375
dsee_bytesrecv{node="vm1",cn="389"} 1608469180
dsee_entriesreturned{node="vm1",cn="389"} 19299393
dsee_referralsreturned{node="vm1",cn="389"} 0

Related

Generic "append to file if not exists" function in Bash

I am trying to write a util function in a bash script that can take a multi-line string and append it to the supplied file if it does not already exist.
This works fine using grep if the pattern does not contain \n.
if grep -qF "$1" $2
then
return 1
else
echo "$1" >> $2
fi
Example usage
append 'sometext\nthat spans\n\tmutliple lines' ~/textfile.txt
I am on MacOS btw which has presented some problems with some of the solutions I've seen posted elsewhere being very linux specific. I'd also like to avoid installing any other tools to achieve this if possible.
Many thanks
If the files are small enough to slurp into a Bash variable (you should be OK up to a megabyte or so on a modern system), and don't contain NUL (ASCII 0) characters, then this should work:
IFS= read -r -d '' contents <"$2"
if [[ "$contents" == *"$1"* ]]; then
return 1
else
printf '%s\n' "$1" >>"$2"
fi
In practice, the speed of Bash's built-in pattern matching might be more of a limitation than ability to slurp the file contents.
See the accepted, and excellent, answer to Why is printf better than echo? for an explanation of why I replaced echo with printf.
Using awk:
awk '
BEGIN {
n = 0 # length of pattern in lines
m = 0 # number of matching lines
}
NR == FNR {
pat[n++] = $0
next
}
{
if ($0 == pat[m])
m++
else if (m > 0 && $0 == pat[0])
m = 1
else
m = 0
}
m == n {
exit
}
END {
if (m < n) {
for (i = 0; i < n; i++)
print pat[i] >>FILENAME
}
}
' - "$2" <<EOF
$1
EOF
if necessary, one would need to properly escape any metacharacters inside FS | OFS :
jot 7 9 |
{m,g,n}awk 'BEGIN { FS = OFS = "11\n12\n13\n"
_^= RS = (ORS = "") "^$" } _<NF || ++NF'
9
10
11
12
13
14
15
jot 7 -2 | (... awk stuff ...)
-2
-1
0
1
2
3
4
11
12
13

Shell script with grep and sed to extract individuals from a pair after comparing the numerical values of a variable

I want to compare a group of words (individuals) in pairs and extract the one with the lowest numeric variable. My files and scripts are made this way.
Relatedness_3rdDegree.txt (example):
Individual1 Individual2
Individual5 Individual23
Individual50 Individual65
filename.imiss
INDV N_DATA N_GENOTYPES_FILTERED N_MISS F_MISS
Individual1 375029 0 782 0.00208517
Individual2 375029 0 341 0.000909263
Individual3 375029 0 341 0.000909263
Main script:
numlines=$(wc -l Relatedness_3rdDegree.txt|awk '{print $1}')
for line in `seq 1 $numlines`
do
ind1=$(sed -n "${line}p" Relatedness_3rdDegree.txt|awk '{print $1}')
ind2=$(sed -n "${line}p" Relatedness_3rdDegree.txt|awk '{print $2}')
miss1=$(grep $ind1 filename.imiss|awk '{print $5}')
miss2=$(grep $ind2 filename.imiss|awk '{print $5}')
if echo "$miss1 > $miss2" | bc -l | grep -q 1
then
echo $ind1 >> miss.txt
else
echo $ind2 >> miss.txt
fi
echo "$line / $numlines"
done
This last script will echo a series of line like this :
1 / 208
2 / 208
3 / 208
and so on, until getting to this error:
91 / 208
(standard_in) 1: syntax error
92 / 208
(standard_in) 1: syntax error
93 / 208
If I go to my output (miss.txt), the printed individuals are not correct.
It should print the individuals, within the pairs contained in the file "Relatedness_3rdDegree.txt", that have the lowest value of F_MISS (column $5 of the "filename.imiss").
For instance, in the pair "Individual1 Individual2", it should compare their values of F_MISS and print only the individual with the lowest value, which in this example would be Individual 2.
I have manually checked the values and the printed individual, and it looks like it printed random individuals per each pair.
What is wrong in this script?
Bash version:
#!/bin/bash
declare -A imiss
while read -r ind nd ngf nm fm # we'll ignore most of these
do
imiss[$ind]=$fm
done < filename.imiss
while read -r i1 i2
do
if (( $(echo "${imiss[$i1]} > ${imiss[$i2]}" | bc -l) ))
then
echo "$i1"
else
echo "$i2"
fi
done < Relatedness_3rdDegree.txt
Run* it like:
bash-imiss
AWK version:
#!/usr/bin/awk -f
NR == FNR {imiss[$1] = $5; next}
{
if (imiss[$1] > imiss[$2]) {
print $1
} else {
print $2
}
}
Run* it like:
awk-imiss filename.imiss Relatedness_3rdDegree.txt
These two scripts do exactly the same thing in exactly the same way using associative arrays.
* This assumes that you have set the script file executable using chmod and that it's in your PATH and that the data files are in your current directory.

Matching a number against a comma-separated sequence of ranges

I'm writing a bash script which takes a number, and also a comma-separated sequence of values and strings, e.g.: 3,15,4-7,19-20. I want to check whether the number is contained in the set corresponding to the sequence. For simplicity, assume no comma-separated elements intersect, and that the elements are sorted in ascending order.
Is there a simple way to do this in bash other than the brute-force naive way? Some shell utility which does something like that for me, maybe something related to lpr which already knows how to process page range sequences etc.
Is awk cheating?:
$ echo -n 3,15,4-7,19-20 |
awk -v val=6 -v RS=, -F- '(NF==1&&$1==val) || (NF==2&&$1<=val&&$2>=val)' -
Output:
4-7
Another version:
$ echo 19 |
awk -v ranges=3,15,4-7,19-20 '
BEGIN {
split(ranges,a,/,/)
}
{
for(i in a) {
n=split(a[i],b,/-/)
if((n==1 && $1==a[i]) || (n==2 && $1>=b[1] && $1<=b[2]))
print a[i]
}
}' -
Outputs:
19-20
The latter is better as you can feed it more values from a file etc. Then again the former is shorter. :D
Pure bash:
check() {
IFS=, a=($2)
for b in "${a[#]}"; do
IFS=- c=($b); c+=(${c[0]})
(( $1 >= c[0] && $1 <= c[1] )) && break
done
}
$ check 6 '3,15,4-7,19-20' && echo "yes" || echo "no"
yes
$ check 42 '3,15,4-7,19-20' && echo "yes" || echo "no"
no
As bash is tagged, why not just
inrange() { for r in ${2//,/ }; do ((${r%-*}<=$1 && $1<=${r#*-})) && break; done; }
Then test it as usual:
$ inrange 6 3,15,4-7,19-20 && echo yes || echo no
yes
$ inrange 42 3,15,4-7,19-20 && echo yes || echo no
no
A function based on #JamesBrown's method:
function match_in_range_seq {
(( $# == 2 )) && [[ -n "$(echo -n "$2" | awk -v val="$1" -v RS=, -F- '(NF==1&&$1==val) || (NF==2&&$1<=val&&$2>=val)' - )" ]]
}
Will return 0 (in $?) if the second argument (the range sequence) contains the first argument, 1 otherwise.
Another awk idea using two input (-v) variables:
# use of function wrapper is optional but cleaner for the follow-on test run
in_range() {
awk -v value="$1" -v r="$2" '
BEGIN { n=split(r,ranges,",")
for (i=1;i<=n;i++) {
low=high=ranges[i]
if (ranges[i] ~ "-") {
split(ranges[i],x,"-")
low=x[1]
high=x[2]
}
if (value >= low && value <= high) {
print value,"found in the range:",ranges[i]
exit
}
}
}'
}
NOTE: the exit assumes no overlapping ranges, ie, value will not be found in more than one 'range'
Take for a test spin:
ranges='3,15,4-7,19-20'
for value in 1 6 15 32
do
echo "########### value = ${value}"
in_range "${value}" "${ranges}"
done
This generates:
########### value = 1
########### value = 6
6 found in the range: 4-7
########### value = 15
15 found in the range: 15
########### value = 32
NOTES:
OP did not mention what to generate as output if no range match is found; code could be modified to output a 'not found' message as needed
in a comment OP mentioned possibly running the search for a number of values; code could be modified to support such a requirement but would need more input (eg, format of list of values, desired output and how to be used/captured by calling process, etc)

Datetime to epoch conversion

I have a bash question (when using awk). I'm extracting every single instance of the first and fifth column in a textfile and piping it to a new file with the following code,
cut -f4 test170201.rawtxt | awk '/stream_0/ { print $1, $5 }' > testLogFile.txt
This is part of the file (test170201.rawtxt) I'm extracting the data from, columns Timestamp and Loss,
Timestamp Stream Status Seq Loss Bytes Delay
17/02/01.10:58:25.212577 stream_0 OK 80281 0 1000 38473
17/02/01.10:58:25.213401 stream_0 OK 80282 0 1000 38472
17/02/01.10:58:25.215560 stream_0 OK 80283 0 1000 38473
17/02/01.10:58:25.216645 stream_0 OK 80284 0 1000 38472
This is the result I'm getting in testLogFile.txt
17/02/01.10:58:25.212577 0
17/02/01.10:58:25.213401 0
17/02/01.10:58:25.215560 0
17/02/01.10:58:25.216645 0
However, I want the Timestamp to be written in epoch in the file above. Is there an easy way of modifying the code I already have to do this?
Given:
$ cat file
Timestamp Stream Status Seq Loss Bytes Delay
17/02/01.10:58:25.212577 stream_0 OK 80281 0 1000 38473
17/02/01.10:58:25.213401 stream_0 OK 80282 0 1000 38472
17/02/01.10:58:25.215560 stream_0 OK 80283 0 1000 38473
17/02/01.10:58:25.216645 stream_0 OK 80284 0 1000 38472
You can write a POSIX Bash script to do what you are looking for:
while IFS= read -r line || [[ -n "$line" ]]; do
if [[ "$line" =~ ^[[:digit:]]{2}/[[:digit:]]{2}/[[:digit:]]{2} ]]
then
arr=($line)
ts=${arr[0]}
dec=${ts##*.} # fractional seconds
# GNU date may need different flags:
epoch=$(date -j -f "%y/%m/%d.%H:%M:%S" "${ts%.*}" "+%s")
printf "%s.%s\t%s\n" "$epoch" "$dec" "${arr[4]}"
fi
done <file >out_file
$ cat out_file
1485975505.212577 0
1485975505.213401 0
1485975505.215560 0
1485975505.216645 0
For GNU date, try:
while IFS= read -r line || [[ -n "$line" ]]; do
if [[ "$line" =~ ^[[:digit:]]{2}/[[:digit:]]{2}/[[:digit:]]{2} ]]
then
arr=($line)
ts="20${arr[0]}"
d="${ts%%.*}"
tmp="${ts%.*}"
tm="${tmp#*.}"
dec="${ts##*.}" # fractional seconds
epoch=$(date +"%s" --date="$d $tm" )
printf "%s.%s\t%s\n" "$epoch" "$dec" "${arr[4]}"
fi
done <file >out_file
For an GNU awk solution, you can do:
awk 'function epoch(s){
split(s, dt, /[/:. ]/)
s="20" dt[1] " " dt[2] " " dt[3] " " dt[4] " " dt[5] " " dt[6]
return mktime(s) "." dt[7]}
/^[0-9][0-9]/ { print epoch($1), $5 }' file >out_file
If you don't want the fractional second included in the epoch, they are easily removed.
awk -F '[.[:blank:]]+' '
# use separator for dot and space (to avoid trailing time info)
{
# for line other than header
if( NR>1) {
# time is set for format "YYYY MM DD HH MM SS [DST]"
# prepare with valuable info
T = "20"$1 " " $2
# use correct separator
gsub( /[\/:]/, " ", T)
# convert to epoch
E = mktime( T)
# print result, adding fractionnal as mentionned later
printf("%d.%d %s\n", E, $3, $7)
}
else {
# print header (line 1)
print $1 " "$7
}
}
' test170201.rawtxt \
> Redirected.file
self commented, code is longer for understanding purpose
use of gnu awk for the mktime function not available in posix or older version
Oneliner a bit optimized here after
awk -F '[.[:blank:]]+' '{if(NR>1){T="20"$1" "$2;gsub(/[\/:]/," ", T);$1=mktime(T)}print $1" "$7}' test170201.rawtxt
Using GNU awk
Input
$ cat f
Timestamp Stream Status Seq Loss Bytes Delay
17/02/01.10:58:25.212577 stream_0 OK 80281 0 1000 38473
17/02/01.10:58:25.213401 stream_0 OK 80282 0 1000 38472
17/02/01.10:58:25.215560 stream_0 OK 80283 0 1000 38473
17/02/01.10:58:25.216645 stream_0 OK 80284 0 1000 38472
Output
$ awk '
BEGIN{cyear = strftime("%y",systime())}
function epoch(v, datetime){
sub(/\./," ",v);
split(v,datetime,/[/: ]/);
datetime[1] = datetime[1] <= cyear ? 2000+datetime[1] : 1900+datetime[1];
return mktime(datetime[1] " " datetime[2] " " datetime[3] " " datetime[4]" " datetime[5]" " datetime[6])
}
/stream_0/{
print epoch($1),$5
}' f
1485926905 0
1485926905 0
1485926905 0
1485926905 0
To write to new file just redirect like below
cut -f4 test170201.rawtxt | awk '
BEGIN{cyear = strftime("%y",systime());}
function epoch(v, datetime){
sub(/\./," ",v);
split(v,datetime,/[/: ]/);
datetime[1] = datetime[1] <= cyear ? 2000+datetime[1] : 1900+datetime[1];
return mktime(datetime[1] " " datetime[2] " " datetime[3] " " datetime[4]" " datetime[5]" " datetime[6])
}
/stream_0/{
print epoch($1),$5
}' > testLogFile.txt

bash routine to return the page number of a given line number from text file

Consider a plain text file containing page-breaking ASCII control character "Form Feed" ($'\f'):
alpha\n
beta\n
gamma\n\f
one\n
two\n
three\n
four\n
five\n\f
earth\n
wind\n
fire\n
water\n\f
Note that each page has a random number of lines.
Need a bash routine that return the page number of a given line number from a text file containing page-breaking ASCII control character.
After a long time researching the solution I finally came across this piece of code:
function get_page_from_line
{
local nline="$1"
local input_file="$2"
local npag=0
local ln=0
local total=0
while IFS= read -d $'\f' -r page; do
npag=$(( ++npag ))
ln=$(echo -n "$page" | wc -l)
total=$(( total + ln ))
if [ $total -ge $nline ]; then
echo "${npag}"
return
fi
done < "$input_file"
echo "0"
return
}
But, unfortunately, this solution proved to be very slow in some cases.
Any better solution ?
Thanks!
The idea to use read -d $'\f' and then to count the lines is good.
This version migth appear not ellegant: if nline is greater than or equal to the number of lines in the file, then the file is read twice.
Give it a try, because it is super fast:
function get_page_from_line ()
{
local nline="${1}"
local input_file="${2}"
if [[ $(wc -l "${input_file}" | awk '{print $1}') -lt nline ]] ; then
printf "0\n"
else
printf "%d\n" $(( $(head -n ${nline} "${input_file}" | grep -c "^"$'\f') + 1 ))
fi
}
Performance of awk is better than the above bash version. awk was created for such text processing.
Give this tested version a try:
function get_page_from_line ()
{
awk -v nline="${1}" '
BEGIN {
npag=1;
}
{
if (index($0,"\f")>0) {
npag++;
}
if (NR==nline) {
print npag;
linefound=1;
exit;
}
}
END {
if (!linefound) {
print 0;
}
}' "${2}"
}
When \f is encountered, the page number is increased.
NR is the current line number.
----
For history, there is another bash version.
This version is using only built-it commands to count the lines in current page.
The speedtest.sh that you had provided in the comments showed it is a little bit ahead (20 sec approx.) which makes it equivalent to your version:
function get_page_from_line ()
{
local nline="$1"
local input_file="$2"
local npag=0
local total=0
while IFS= read -d $'\f' -r page; do
npag=$(( npag + 1 ))
IFS=$'\n'
for line in ${page}
do
total=$(( total + 1 ))
if [[ total -eq nline ]] ; then
printf "%d\n" ${npag}
unset IFS
return
fi
done
unset IFS
done < "$input_file"
printf "0\n"
return
}
awk to the rescue!
awk -v RS='\f' -v n=09 '$0~"^"n"." || $0~"\n"n"." {print NR}' file
3
updated anchoring as commented below.
$ for i in $(seq -w 12); do awk -v RS='\f' -v n="$i"
'$0~"^"n"." || $0~"\n"n"." {print n,"->",NR}' file; done
01 -> 1
02 -> 1
03 -> 1
04 -> 2
05 -> 2
06 -> 2
07 -> 2
08 -> 2
09 -> 3
10 -> 3
11 -> 3
12 -> 3
A script of similar length can be written in bash itself to locate and respond to the embedded <form-feed>'s contained in a file. (it will work for POSIX shell as well, with substitute for string index and expr for math) For example,
#!/bin/bash
declare -i ln=1 ## line count
declare -i pg=1 ## page count
fname="${1:-/dev/stdin}" ## read from file or stdin
printf "\nln:pg text\n" ## print header
while read -r l; do ## read each line
if [ ${l:0:1} = $'\f' ]; then ## if form-feed found
((pg++))
printf "<ff>\n%2s:%2s '%s'\n" "$ln" "$pg" "${l:1}"
else
printf "%2s:%2s '%s'\n" "$ln" "$pg" "$l"
fi
((ln++))
done < "$fname"
Example Input File
The simple input file with embedded <form-feed>'s was create with:
$ echo -e "a\nb\nc\n\fd\ne\nf\ng\nh\n\fi\nj\nk\nl" > dat/affex.txt
Which when output gives:
$ cat dat/affex.txt
a
b
c
d
e
f
g
h
i
j
k
l
Example Use/Output
$ bash affex.sh <dat/affex.txt
ln:pg text
1: 1 'a'
2: 1 'b'
3: 1 'c'
<ff>
4: 2 'd'
5: 2 'e'
6: 2 'f'
7: 2 'g'
8: 2 'h'
<ff>
9: 3 'i'
10: 3 'j'
11: 3 'k'
12: 3 'l'
With Awk, you can define RS (the record separator, default newline) to form feed (\f) and IFS (the input field separator, default any sequence of horizontal whitespace) to newline (\n) and obtain the number of lines as the number of "fields" in a "record" which is a "page".
The placement of form feeds in your data will produce some empty lines within a page so the counts are off where that happens.
awk -F '\n' -v RS='\f' '{ print NF }' file
You could reduce the number by one if $NF == "", and perhaps pass in the number of the desired page as a variable:
awk -F '\n' -v RS='\f' -v p="2" 'NR==p { print NF - ($NF == "") }' file
To obtain the page number for a particular line, just feed head -n number to the script, or loop over the numbers until you have accrued the sum of lines.
line=1
page=1
for count in $(awk -F '\n' -v RS='\f' '{ print NF - ($NF == "") }' file); do
old=$line
((line += count))
echo "Lines $old through line are on page $page"
((page++)
done
This gnu awk script prints the "page" for the linenumber given as command line argument:
BEGIN { ffcount=1;
search = ARGV[2]
delete ARGV[2]
if (!search ) {
print "Please provide linenumber as argument"
exit(1);
}
}
$1 ~ search { printf( "line %s is on page %d\n", search, ffcount) }
/[\f]/ { ffcount++ }
Use it like awk -f formfeeds.awk formfeeds.txt 05 where formfeeds.awk is the script, formfeeds.txt is the file and '05' is a linenumber.
The BEGIN rule deals mostly with the command line argument. The other rules are simple rules:
$1 ~ search applies when the first field matches the commandline argument stored in search
/[\f]/ applies when there is a formfeed

Resources