How to print the remaining columns using awk? - bash

Right now I have a command that prints my log file with a delimited | per column.
cat ambari-alerts.log | awk -F '[ ]' '{print $1 "|" $2 "|" $3 "|" $4 "|" $5 "|"}' |
grep "$(date +"%Y-%m-%d")"
Sample of the log file data is this:
2016-02-11 09:40:33,875 [OK] [MAPREDUCE2] [mapreduce_history_server_rpc_latency] (History Server RPC Latency) Average Queue Time:[0.0], Average Processing Time:[0.0]
The result of my command is this:
2016-02-11|09:40:33,875|[OK]|[MAPREDUCE2]|[mapreduce_history_server_rpc_latency]
I want to print the remaining columns. How can I do that? I tried this syntax adding $0, but unfortunately it just prints the whole line again.
awk -F '[ ]' '{print $1 "|" $2 "|" $3 "|" $4 "|" $5 "|" $0}'
Hope you can help me, newbie here in using awk.

This seems to be all you need:
$ awk '{for (i=1;i<=5;i++) sub(/ /,"|")} 1' file
2016-02-11|09:40:33,875|[OK]|[MAPREDUCE2]|[mapreduce_history_server_rpc_latency]|(History Server RPC Latency) Average Queue Time:[0.0], Average Processing Time:[0.0]

This is a bit of a hassle with awk
awk -F '[ ]' '{
printf "%s|%s|%s|%s|%s|", $1, $2, $3, $4, $5
for (i=6; i<=NF; i++) printf "%s ", $i
print ""
}'
or, replace the first 5 spaces:
awk -F '[ ]' '{
sub(/ /, "|");sub(/ /, "|");sub(/ /, "|");sub(/ /, "|");sub(/ /, "|")
print
}'
This is actually easier in bash
while IFS=" " read -r a b c d e rest; do
echo "$a|$b|$c|$d|$e|$rest"
done < file.log
Folding in your grep:
awk -F '[ ]' -v date="$(date +%Y-%m-%d)" '{
$0 ~ date {
printf "%s|%s|%s|%s|%s|", $1, $2, $3, $4, $5
for (i=6; i<=NF; i++) printf "%s ", $i
print ""
}
}'

Here is some awk that provides a somewhat more generalized approach than brute-forcing the first 5 columns:
awk '{
for (i = 1; i < 6; i++)
printf "%s|", $i
for (i = 6; i < NF; i++)
printf " %s ", $i
}' ambari-alerts.log | grep "$(date +"%Y-%m-%d")"

Related

Filter lines based on certain string and then print only some attributes greater

I have a big text file with million of log lines.
I would like to filter all the lines which satisfy following criteria
url should be url=/v2/testB
totalTime value should be greater than 500
INFO|id=1|totaltime=5000|httpmethod=POST|url=/v1/testA
INFO|id=2|totaltime=200|httpmethod=POST|url=/v2/testB
INFO|id=3|totaltime=1000|httpmethod=POST|url=/v2/testB
INFO|id=4|totaltime=501|httpmethod=POST|url=/v2/testB
result:-
id=3,totaltime=1000
id=4,totaltime=501
I have tried using multiple awk and then putting if block, I wonder, it can be done quickly? Thanks !
while IFS= read -r line; do
value=`echo $line|grep "url=/v2/testB" | awk -F"totaltime=" '{ print $2}'| awk -F"|" '{ print $1}'`
if (( $value > 500 )); then
echo $line
fi
done < file.log
You may use this awk:
awk -F '|' -v OFS=, '$NF == "url=/v2/testB" {v=$3; sub(/^totaltime=/, "", v); if (v+0 > 500) print $2, $3}' file
id=3,totaltime=1000
id=4,totaltime=501
To make it more readable:
awk -F '|' -v OFS=, '
$NF == "url=/v2/testB" {
v = $3
sub(/^totaltime=/, "", v)
if (v+0 > 500)
print $2, $3
}' file
If you have gnu-awk then it can be reduced to:
awk -F '|' -v OFS=, '$NF == "url=/v2/testB" &&
gensub(/^totaltime=/, "", "1", $3)+0 > 500 {print $2, $3}' file
v+0 is shorthand in awk to covert a string value to number.
$ awk -F'|' -v OFS=',' '{split($3,t,/=/)} $5=="url=/v2/testB" && t[2]>500{print $2, $3}' file
id=3,totaltime=1000
id=4,totaltime=501
You seem to be in luck:
awk -F'|' 'BEGIN{FS="|"; OFS=","}
{ url = substr($NF,index($NF,"=")+1)
totaltime = substr($3,index($3,"=")+1)
}
(url == "/v1/testB") && (totaltime+0 > 500) { print $2,$3 }
' file
With your shown samples, please try following awk program.
awk -F'\\||totaltime=' '$NF=="url=/v2/testB" && $4>500{print $2",totaltime="$4}' Input_file
Explanation: Following is the detailed explanation for above code.
Setting field separator by using -F option in awk program.
Setting field separators to | and totaltime= for all the lines of Input_file.
In main program, checking conditions:
a- If $NF(last field) is equal to url=/v2/testB AND
b- 4th field is greater than 500 then do:
print 2nd field of current line followed by string ,totaltime= followed by 4th field as per required output by OP.
All the awk solutions are great, and if that is a solution use them.
If you wanted to fix your Bash effort, you can do:
while IFS='|' read -r id ti; do
[[ "${ti#*=}" -gt 500 ]] && printf "%s,%s\n" "$id" "$ti"
done < <(grep 'url=/v2/testB$' file | cut -d '|' -f 2,3)
Alternatively, you can eliminate cut and keep all five fields:
while IFS='|' read -r c1 c2 c3 c4 c5; do
[[ "${c3#*=}" -gt 500 ]] && printf "%s,%s\n" "$c2" "$c5"
done < <(grep 'url=/v2/testB$' file)
Either prints:
id=3,totaltime=1000
id=4,totaltime=501

Iterating through Comma Separated rows in loop in Shell

Final.txt
Failed,2021-12-07 22:30 EST,Scheduled Backup,abc,/clients/FORD_1030PM_EST_Windows2008,Windows File System
Failed,2021-12-07 22:00 EST,Scheduled Backup,def,/clients/FORD_10PM_EST_Windows2008,Windows File System
I want to iterate through these rows instead of column
Expected Output
client=abc
client=def
group=/clients/FORD_1030PM_EST_Windows2008
group=/clients/FORD_10PM_EST_Windows2008
I tried this
while read line ; do
group=$(awk -F',' '{print $4}')
client=$(awk -F',' '{print $5}')
echo $group
echo $client
done < Final
it's Not working but when I am individually doing this
cat Final | awk -F',' '{print $4}'
then it is giving me the expected output but does not work when I am trying in the loop.
With GNU awk:
awk -F ',' 'BEGINFILE{f++}
f==1{print "client=" $4}
f==2{print "group=" $5}
' Final Final
Output:
client=abc
client=def
grooup=/clients/FORD_1030PM_EST_Windows2008
grooup=/clients/FORD_10PM_EST_Windows2008
One-pass awk solution, storing field $5 in an array for printing at the end:
$ awk -F, '{print $4; groups[NR]=$5} END {for (i=1;i<=NR;i++) print groups[i]}' Final.txt
abc
def
/clients/FORD_1030PM_EST_Windows2008
/clients/FORD_10PM_EST_Windows2008
Two-pass awk that eliminates need to store field $5 in an array:
$ awk -F, 'FNR==NR {print $4;next} {print $5}' Final.txt Final.txt
abc
def
/clients/FORD_1030PM_EST_Windows2008
/clients/FORD_10PM_EST_Windows2008
with bash
declare -a client group
while IFS=, read -ra fields; do
client+=("${fields[3]}")
group+=("${fields[4]}")
done < Final
printf 'client=%s\n' "${client[#]}"
printf 'group=%s\n' "${group[#]}"
Using miller, it's no different really from a single-pass awk
solution, collecting the values in arrays:
mlr --icsv --implicit-csv-header put '
#client[NR] = $4;
#group[NR] = $5;
filter false;
end {
emit #client, "client";
emit #group, "group";
}
' Final
The equivalent of the above, as more readable (IMO) awk code:
awk -F, '
{client[NR] = $4; group[NR] = $5}
END {
for (i=1; i<=NR; i++) print "client=" client[i]
for (i=1; i<=NR; i++) print "group=" group[i]
}
' Final
Using csvtool is nice because it has a transpose function,
but it still needs help getting to the desired output
csvtool col 4,5 Final \
| csvtool cat <(echo "client,group") - \
| csvtool transpose - \
| awk -F, -v OFS="=" '{for (i=2; i<=NF; i++) print $1, $i}'
LOG="Failed,2021-12-07 22:30 EST,Scheduled Backup,abc,/clients/FORD_1030PM_EST_Windows2008,Windows File System
Failed,2021-12-07 22:00 EST,Scheduled Backup,def,/clients/FORD_10PM_EST_Windows2008,Windows File System"
getColumnTextSed(){
log="$1"
column=$2
[ -n "$log" -a -n "$column" ] && sed -E 's/([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),(.*)/\'$column'/mg;t;d' <<<$log
}
getColumnTextAWK(){
log="$1"
column=$2
[ -n "$log" -a -n "$column" ] && awk -F',' -v "c=$column" '{print $(c)}' <<<$log
}
echo "# with sed and column number"
echo "-------------------------------"
getColumnTextSed "$LOG" 4
getColumnTextSed "$LOG" 5
echo "-------------------------------"
echo "# with awk and column number"
echo "-------------------------------"
getColumnTextAWK "$LOG" 4
getColumnTextAWK "$LOG" 5
OUTPUT
# with sed
-------------------------------
abc
def
/clients/FORD_1030PM_EST_Windows2008
/clients/FORD_10PM_EST_Windows2008
-------------------------------
# with awk
-------------------------------
abc
def
/clients/FORD_1030PM_EST_Windows2008
/clients/FORD_10PM_EST_Windows2008

Arguments to change variable values in Bash script

i have this script in bash:
#!/bin/bash
dir="/home/dortiz/Prueba"
for i in $dir/*
do
cat $i | awk '{print $1" " $2" " $3" " $4"\n " $5}' | \
awk '/gi/{print ">" $0; getline; print}' | \
awk '$3>20.00 {print $0; getline; print;}' \
> "${i}.outsel"
done
cd /home/dortiz/Prueba
mv *.outsel /home/dortiz/Prueba2
and i would like to set an argument to change the value after ""awk '$3>"" in an easy way from my main program that will call this script.
i have read something about getopts but i dont uderstand it at all
Thanks a lot in advance
The simplest way is to just pass an argument to your script:
yourscript.sh 20.0
Then in your script
#!/bin/bash
value=$1 # store the value passed in as the first parameter.
dir="/home/dortiz/Prueba"
for i in $dir/*; do
awk '{print $1" " $2" " $3" " $4"\n " $5}' "$i" |
awk '/gi/{print ">" $0; getline; print}' |
awk -v val="$value" '$3>val {print $0; getline; print;}' > "${i}.outsel"
# ^^^^^^^^^^^^^^^
done
...
and the cat|awk|awk|awk pipeline can probably be written like this:
awk -v val="$value" '
$3 > val {
prefix = /gi/ ? ">" : ""
print prefix $1 " " $2" " $3" " $4"\n " $5
}
' "$i" > "$i.outsel"

Awk Bash If-Else Issue

I have a problem of being unable to printout a error message if 0 records are found.
this is what I have as of now.
function search_title
{
awk -F':' -v search="$Title" '$2 ~ search { i++;} END { printf "%d records found\n", i }' test.txt
awk -F':' -v search="$Title" '$2 ~ search { i++; printf "%s, %s,%s,%s,%s\n", $1, $2, $3, $4, $5 } END {}' test.txt
}
function search_author
{
awk -F':' -v search="$Author" '$2 ~ search { i++;} END { printf "%d records found\n", i }' test.txt
awk -F':' -v search="$Author" '$2 ~ search { i++; printf "%s, %s,%s,%s,%s\n", $1, $2, $3, $4, $5 } END {}' test.txt
}
function search_both
{
awk -F':' -v search="$Title" -v search1="$Author" '$1 ~ search && $2 ~ search1 { i++;} END { printf "%d records found\n", i }' test.txt
awk -F':' -v search="$Title" -v search1="$Author" '$1 ~ search && $2 ~ search1 { i++; printf "%s, %s,%s,%s,%s\n", $1, $2, $3, $4, $5 } END {}' test.txt
}
read -p $'Title: ' Title
read -p $'Author: ' Author
if [ "$Title" == "" ];
then
search_author
elif [ "$Author" == "" ];
then
search_title
else
search_both
fi
I need a if else statement to check if the counter is 0 in awk print out "Error! Book does not exist"
For example,
Title input as DAFT
Author input as Linken
(Both value not in test.txt)
"Error! Book does not exist"
instead of the printf now which is "0 Record Found"
You don't need 2 awk command in each function:
You can combine both awk in one command:
awk -F':' -v search="$Title" -v search1="$Author" '$1 ~ search && $2 ~ search1 {
i++;
printf "%s, %s,%s,%s,%s\n", $1, $2, $3, $4, $5;
}
END {
if (!i)
print "Error! Book does not exists!";
else
printf "%d records found\n", i;
}' test.txt

How can I print the duplicates in a file only once?

I have an input file that contains:
123,apple,orange
123,pineapple,strawberry
543,grapes,orange
790,strawberry,apple
870,peach,grape
543,almond,tomato
123,orange,apple
i want the output to be:
The following numbers are repeated:
123
543
is there a way to get this output using awk; i'm writing the script in solaris , bash
sed -e 's/,/ , /g' <filename> | awk '{print $1}' | sort | uniq -d
awk -vFS=',' \
'{KEY=$1;if (KEY in KEYS) { DUPS[KEY]; }; KEYS[KEY]; } \
END{print "Repeated Keys:"; for (i in DUPS){print i} }' \
< yourfile
There are solutions with sort/uniq/cut as well (see above).
If you can live without awk, you can use this to get the repeating numbers:
cut -d, -f 1 my_file.txt | sort | uniq -d
Prints
123
543
Edit: (in response to your comment)
You can buffer the output and decide if you want to continue. For example:
out=$(cut -d, -f 1 a.txt | sort | uniq -d | tr '\n' ' ')
if [[ -n $out ]] ; then
echo "The following numbers are repeated: $out"
exit
fi
# continue...
This script will print only the number of the first column that are repeated more than once:
awk -F, '{a[$1]++}END{printf "The following numbers are repeated: ";for (i in a) if (a[i]>1) printf "%s ",i; print ""}' file
Or in a bit shorter form:
awk -F, 'BEGIN{printf "Repeated "}(a[$1]++ == 1){printf "%s ", $1}END{print ""} ' file
If you want to exit your script in case a dup is found, then you can exit a non-zero exit code. For example:
awk -F, 'a[$1]++==1{dup=1}END{if (dup) {printf "The following numbers are repeated: ";for (i in a) if (a[i]>1) printf "%s ",i; print "";exit(1)}}' file
In your main script you can do:
awk -F, 'a[$1]++==1{dup=1}END{if (dup) {printf "The following numbers are repeated: ";for (i in a) if (a[i]>1) printf "%s ",i; print "";exit(-1)}}' file || exit -1
Or in a more readable format:
awk -F, '
a[$1]++==1{
dup=1
}
END{
if (dup) {
printf "The following numbers are repeated: ";
for (i in a)
if (a[i]>1)
printf "%s ",i;
print "";
exit(-1)
}
}
' file || exit -1

Resources