I have a txt in my folder named parameters.txt which contains
PP1 20 30 40 60
PP2 0 0 0 0
I'd like to use awk to read the different parameters depending on the value of the first text field in each line. At the moment, if I run
src_dir='/PP1/'
awk "$src_dir" '{ print $2 }' parameters.txt
I correctly get
20
I would simply like to store that 20 into a variable and to export the variable itself.
Thanks in advance!
If you want to save the output, do var=$(awk expression):
result=$(awk -v value=$src_dir '($1==value) { print $2 }' parameters.txt)
You can make your command more general giving awk the variable with the -v syntax:
$ var="PP1"
$ awk -v v=$var '($1==v) { print $2 }' a
20
$ var="PP2"
$ awk -v v=$var '($1==v) { print $2 }' a
0
You don't really need awk for that. You can do it in bash.
$ src_dir="PP1"
$ while read -r pattern columns ; do
set - $columns
if [[ $pattern =~ $src_dir ]]; then
variable=$2
fi
done < parameters.txt
shell_pattern=PP1
output_var=$(awk -v patt=$shell_pattern '$0 ~ patt {print $2}' file)
Note that $output_var may contain more than one value if the pattern matches more than one line. If you're only interested in the first value, then have the awk program exit after printing .
Related
I have a big text file with million of log lines.
I would like to filter all the lines which satisfy following criteria
url should be url=/v2/testB
totalTime value should be greater than 500
INFO|id=1|totaltime=5000|httpmethod=POST|url=/v1/testA
INFO|id=2|totaltime=200|httpmethod=POST|url=/v2/testB
INFO|id=3|totaltime=1000|httpmethod=POST|url=/v2/testB
INFO|id=4|totaltime=501|httpmethod=POST|url=/v2/testB
result:-
id=3,totaltime=1000
id=4,totaltime=501
I have tried using multiple awk and then putting if block, I wonder, it can be done quickly? Thanks !
while IFS= read -r line; do
value=`echo $line|grep "url=/v2/testB" | awk -F"totaltime=" '{ print $2}'| awk -F"|" '{ print $1}'`
if (( $value > 500 )); then
echo $line
fi
done < file.log
You may use this awk:
awk -F '|' -v OFS=, '$NF == "url=/v2/testB" {v=$3; sub(/^totaltime=/, "", v); if (v+0 > 500) print $2, $3}' file
id=3,totaltime=1000
id=4,totaltime=501
To make it more readable:
awk -F '|' -v OFS=, '
$NF == "url=/v2/testB" {
v = $3
sub(/^totaltime=/, "", v)
if (v+0 > 500)
print $2, $3
}' file
If you have gnu-awk then it can be reduced to:
awk -F '|' -v OFS=, '$NF == "url=/v2/testB" &&
gensub(/^totaltime=/, "", "1", $3)+0 > 500 {print $2, $3}' file
v+0 is shorthand in awk to covert a string value to number.
$ awk -F'|' -v OFS=',' '{split($3,t,/=/)} $5=="url=/v2/testB" && t[2]>500{print $2, $3}' file
id=3,totaltime=1000
id=4,totaltime=501
You seem to be in luck:
awk -F'|' 'BEGIN{FS="|"; OFS=","}
{ url = substr($NF,index($NF,"=")+1)
totaltime = substr($3,index($3,"=")+1)
}
(url == "/v1/testB") && (totaltime+0 > 500) { print $2,$3 }
' file
With your shown samples, please try following awk program.
awk -F'\\||totaltime=' '$NF=="url=/v2/testB" && $4>500{print $2",totaltime="$4}' Input_file
Explanation: Following is the detailed explanation for above code.
Setting field separator by using -F option in awk program.
Setting field separators to | and totaltime= for all the lines of Input_file.
In main program, checking conditions:
a- If $NF(last field) is equal to url=/v2/testB AND
b- 4th field is greater than 500 then do:
print 2nd field of current line followed by string ,totaltime= followed by 4th field as per required output by OP.
All the awk solutions are great, and if that is a solution use them.
If you wanted to fix your Bash effort, you can do:
while IFS='|' read -r id ti; do
[[ "${ti#*=}" -gt 500 ]] && printf "%s,%s\n" "$id" "$ti"
done < <(grep 'url=/v2/testB$' file | cut -d '|' -f 2,3)
Alternatively, you can eliminate cut and keep all five fields:
while IFS='|' read -r c1 c2 c3 c4 c5; do
[[ "${c3#*=}" -gt 500 ]] && printf "%s,%s\n" "$c2" "$c5"
done < <(grep 'url=/v2/testB$' file)
Either prints:
id=3,totaltime=1000
id=4,totaltime=501
I'm parsing source input files using a bash script. I'm generating delimited output in a file. I need a way to check that each field of the delimited output is populated. For example AA,BB,3,4,5,6,7,8 would be good and AA,,3,4,5,6,,8 would be bad. How do I check if there are blank fields on a line using sed/awk or some other tool I can put in a bash script? Thanks in advance!
With bash:
string='AA,,3,4,5,6,,8'
if [[ $string =~ ^,|,,|,$ ]]; then
echo "error"
else
echo "okay"
fi
Output:
error
You can print the lines with at least one empty field using:
awk -F, '{for (i=1;i<=NF;i++) if ($i=="") {print; next}}'
-F, sets the field delimiter as ,
for (i=1;i<=NF;i++) iterates over the fields
if ($i=="") {print; next} prints the record if the field being tested is empty and goes to the next record
Example:
% cat file.txt
AA,BB,3,4,5,6,7,8
AA,,3,4,5,6,,8
% awk -F, '{for (i=1;i<=NF;i++) if ($i=="") {print; next}}' file.txt
AA,,3,4,5,6,,8
You can test with a regular expression with a repeating group that fits with your requirement:
grep -E '^([^,]+,)*[^,]$' <<< "${AA,,3,4,5,6,,8}"
Testcode:
for str in "AA,BB,3,4,5,6,7,8" "AA,,3,4,5,6,,8" ; do
echo "==========="
echo "Testing >>>${str}<<<"
grep -Eq '^([^,]+,)*[^,]$' <<< "${str}" || echo "String incorrect"
done
You can grep the incorrect lines from a file using
grep -vE '^([^,]+,)*[^,]$' inputfile
I am reading a file (test.log.csv) line by line until the end of the file, and I want to extract the value at 4th column of current line read then output the value to a text file. (output.txt)
For example, right now I read until 2nd line (INSERT,SLT_TEST_1,TEST,1127192896,0,DEBUG1) and I want to extract the number at column 4 in the current line and output to a text file named as output.txt.
test.log.csv
INSERT,SLT_TEST_1,TEST,1127192896,0,DEBUG1
INSERT,SLT_TEST_1,TEST,1127192896,0,DEBUG1
INSERT,SLT_TEST_1,TEST,1127192896,0,DEBUG1
The desired output is
output.txt
1127192896
1127192896
1127192896
Right now my script is as below
#! /bin/bash
clear
rm /home/mobaxterm/Script/output.txt
while IFS= read -r line
do
if [[ $line == *"INSERT"* ]] && [[ $line == *"$1"* ]]
then
echo $line >> /home/mobaxterm/Script/output.txt
lastID=$(awk -F "," '{if (NR==curLine) { print $4 }}' curLine="${lineCount}")
echo $lastID
else
if [ lastID == "$1" ]
then
echo $line >> /home/mobaxterm/Script/output.txt
fi
fi
lineCount=$(($lineCount+1))
done < "/home/mobaxterm/Script/test.log.csv"
The parameter ($1) will be 1127192896
I tried declaring a counter in the loop and compare NR with the counter, but the script just stopped after it found the first one.
Find all the lines where the 4th field is 1127192896 and output the 4th field:
awk -F, -v SEARCH="1127192896" '$4 ~ SEARCH {print $4}' test.log.csv
1127192896
1127192896
1127192896
Find all the lines containing the word "INSERT" and where the 4th field is 1127192896
awk -F, -v SEARCH="1127192896" '$4 ~ SEARCH && /INSERT/ {print $4}' test.log.csv
If you have the number you want to look for in a variable called $1, put that in place of the 1127192896, like this:
awk -F, -v SEARCH="$1" '$4 ~ SEARCH && /INSERT/ {print $4}' test.log.csv
You can combine variable substitution and definition of array.
array_variable=( ${line//,/ /} )
sth_you_need=${array_variable[1]}
Or you can just use awk/cut
sth_you_need=$(echo $line | awk -F, 'NR==2{print $2}')
# or
sth_you_need=$(echo $line | cut -d, -f2)
I'm using the following awk command:
my_command | awk -F "[[:space:]]{2,}+" 'NR>1 {print $2}' | egrep "^[[:alnum:]]"
which successfully returns my data like this:
fileName1
file Name 1
file Nameone
f i l e Name 1
So as you can see some file names have spaces. This is fine as I'm just trying to echo the file name (nothing special). The problem is calling that specific row within a loop. I'm trying to do it this way:
i=1
for num in $rows
do
fileName=$(my_command | awk -F "[[:space:]]{2,}+" 'NR==$i {print $2}' | egrep "^[[:alnum:]])"
echo "$num $fileName"
$((i++))
done
But my output is always null
I've also tried using awk -v record=$i and then printing $record but I get the below results.
f i l e Name 1
EDIT
Sorry for the confusion: rows is a variable that list ids like this 11 12 13
and each one of those ids ties to a file name. My command without doing any parsing looks like this:
id File Info OS
11 File Name1 OS1
12 Fi leNa me2 OS2
13 FileName 3 OS3
I can only use the id field to run a the command that I need, but I want to use the File Info field to notify the user of the actual File that the command is being executed against.
I think your $i does not expand as expected. You should quote your arguments this way:
fileName=$(my_command | awk -F "[[:space:]]{2,}+" "NR==$i {print \$2}" | egrep "^[[:alnum:]]")
And you forgot the other ).
EDIT
As an update to your requirement you could just pass the rows to a single awk command instead of a repeatitive one inside a loop:
#!/bin/bash
ROWS=(11 12)
function my_command {
# This function just emulates my_command and should be removed later.
echo " id File Info OS
11 File Name1 OS1
12 Fi leNa me2 OS2
13 FileName 3 OS3"
}
awk -- '
BEGIN {
input = ARGV[1]
while (getline line < input) {
sub(/^ +/, "", line)
split(line, a, / +/)
for (i = 2; i < ARGC; ++i) {
if (a[1] == ARGV[i]) {
printf "%s %s\n", a[1], a[2]
break
}
}
}
exit
}
' <(my_command) "${ROWS[#]}"
That awk command could be condensed to one line as:
awk -- 'BEGIN { input = ARGV[1]; while (getline line < input) { sub(/^ +/, "", line); split(line, a, / +/); for (i = 2; i < ARGC; ++i) { if (a[1] == ARGV[i]) {; printf "%s %s\n", a[1], a[2]; break; }; }; }; exit; }' <(my_command) "${ROWS[#]}"
Or better yet just use Bash instead as a whole:
#!/bin/bash
ROWS=(11 12)
while IFS=$' ' read -r LINE; do
IFS='|' read -ra FIELDS <<< "${LINE// +( )/|}"
for R in "${ROWS[#]}"; do
if [[ ${FIELDS[0]} == "$R" ]]; then
echo "${R} ${FIELDS[1]}"
break
fi
done
done < <(my_command)
It should give an output like:
11 File Name1
12 Fi leNa me2
Shell variables aren't expanded inside single-quoted strings. Use the -v option to set an awk variable to the shell variable:
fileName=$(my_command | awk -v i=$i -F "[[:space:]]{2,}+" 'NR==i {print $2}' | egrep "^[[:alnum:]])"
This method avoids having to escape all the $ characters in the awk script, as required in konsolebox's answer.
As you already heard, you need to populate an awk variable from your shell variable to be able to use the desired value within the awk script so thi:
awk -F "[[:space:]]{2,}+" 'NR==$i {print $2}' | egrep "^[[:alnum:]]"
should be this:
awk -v i="$i" -F "[[:space:]]{2,}+" 'NR==i {print $2}' | egrep "^[[:alnum:]]"
Also, though, you don't need awk AND grep since awk can do anything grep van do so you can change this part of your script:
awk -v i="$i" -F "[[:space:]]{2,}+" 'NR==i {print $2}' | egrep "^[[:alnum:]]"
to this:
awk -v i="$i" -F "[[:space:]]{2,}+" '(NR==i) && ($2~/^[[:alnum:]]/){print $2}'
and you don't need a + after a numeric range so you can change {2,}+ to just {2,}:
awk -v i="$i" -F "[[:space:]]{2,}" '(NR==i) && ($2~/^[[:alnum:]]/){print $2}'
Most importantly, though, instead of invoking awk once for every invocation of my_command, you can just invoke it once for all of them, i.e. instead of this (assuming this does what you want):
i=1
for num in rows
do
fileName=$(my_command | awk -v i="$i" -F "[[:space:]]{2,}" '(NR==i) && ($2~/^[[:alnum:]]/){print $2}')
echo "$num $fileName"
$((i++))
done
you can do something more like this:
for num in rows
do
my_command
done |
awk -F '[[:space:]]{2,}' '$2~/^[[:alnum:]]/{print NR, $2}'
I say "something like" because you don't tell us what "my_command", "rows" or "num" are so I can't be precise but hopefully you see the pattern. If you give us more info we can provide a better answer.
It's pretty inefficient to rerun my_command (and awk) every time through the loop just to extract one line from its output. Especially when all you're doing is printing out part of each line in order. (I'm assuming that my_command really is exactly the same command and produces the same output every time through your loop.)
If that's the case, this one-liner should do the trick:
paste -d' ' <(printf '%s\n' $rows) <(my_command |
awk -F '[[:space:]]{2,}+' '($2 ~ /^[::alnum::]/) {print $2}')
I have this command which executes correctly if run directly on the terminal.
awk '/word/ {print NR}' file.txt | head -n 1
The purpose is to find the line number of the line on which the word 'word' first appears in file.txt.
But when I put it in a script file, it doens't seem to work.
#! /bin/sh
if [ $# -ne 2 ]
then
echo "Usage: $0 <word> <filename>"
exit 1
fi
awk '/$1/ {print NR}' $2 | head -n 1
So what did I do wrong?
Thanks,
Replace the single quotes with double quotes so that the $1 is evaluated by the shell:
awk "/$1/ {print NR}" $2 | head -n 1
In the shell, single-quotes prevent parameter-substitution; so if your script is invoked like this:
script.sh word
then you want to run this AWK program:
/word/ {print NR}
but you're actually running this one:
/$1/ {print NR}
and needless to say, AWK has no idea what $1 is supposed to be.
To fix this, change your single-quotes to double-quotes:
awk "/$1/ {print NR}" $2 | head -n 1
so that the shell will substitute word for $1.
You should use AWK's variable passing feature:
awk -v patt="$1" '$0 ~ patt {print NR; exit}' "$2"
The exit makes the head -1 unnecessary.
you could also pass the value as a variable to awk:
awk -v varA=$1 '{if(match($0,varA)>0){print NR;}}' $2 | head -n 1
Seems more cumbersome than the above, but illustrates passing vars.