How to use for i loop in search pattern in awk

How to use for i loop in search pattern in awk - bash

I am trying to count strings containing a number at the end in a large data file, and for this use the "for i loop" to search all of them consecutively. Here is my code:
#!/bin/bash
for (( i=2; i<=253; i++ ))
do
awk -F "\t" '$3 ~ /^names.i$/ {++c} END {print c}' myfile >> output.txt
done
For some reason although using awk only gives the right output, the script produces just empty spaces in shell. What do I do wrong?

Just do the whole thing in 1 awk invocation:
awk -F '\t' '
{ split($3,arr,/\./); ++c[arr[2]] }
END { for (i=2;i <= 253;i++) print c[i]+0 }
' myfile > output.txt

You can't use shell variable i directly in awk like that. Pass it to awk first:
for (( i=2; i<=253; i++ ))
do
awk -v i=$i -F "\t" '$3 ~ "^names\." i "$" {++c} END {print c}' myfile >> output.txt
done

Try this
awk -F "\t" '{for (i=2;i<=253;i++) if ($3 ~ /^names.i$/) ++c} END {print c}' myfile

Related

Iterating through Comma Separated rows in loop in Shell

Final.txt
Failed,2021-12-07 22:30 EST,Scheduled Backup,abc,/clients/FORD_1030PM_EST_Windows2008,Windows File System
Failed,2021-12-07 22:00 EST,Scheduled Backup,def,/clients/FORD_10PM_EST_Windows2008,Windows File System
I want to iterate through these rows instead of column
Expected Output
client=abc
client=def
group=/clients/FORD_1030PM_EST_Windows2008
group=/clients/FORD_10PM_EST_Windows2008
I tried this
while read line ; do
group=$(awk -F',' '{print $4}')
client=$(awk -F',' '{print $5}')
echo $group
echo $client
done < Final
it's Not working but when I am individually doing this
cat Final | awk -F',' '{print $4}'
then it is giving me the expected output but does not work when I am trying in the loop.

With GNU awk:
awk -F ',' 'BEGINFILE{f++}
f==1{print "client=" $4}
f==2{print "group=" $5}
' Final Final
Output:
client=abc
client=def
grooup=/clients/FORD_1030PM_EST_Windows2008
grooup=/clients/FORD_10PM_EST_Windows2008

One-pass awk solution, storing field $5 in an array for printing at the end:
$ awk -F, '{print $4; groups[NR]=$5} END {for (i=1;i<=NR;i++) print groups[i]}' Final.txt
abc
def
/clients/FORD_1030PM_EST_Windows2008
/clients/FORD_10PM_EST_Windows2008
Two-pass awk that eliminates need to store field $5 in an array:
$ awk -F, 'FNR==NR {print $4;next} {print $5}' Final.txt Final.txt
abc
def
/clients/FORD_1030PM_EST_Windows2008
/clients/FORD_10PM_EST_Windows2008

with bash
declare -a client group
while IFS=, read -ra fields; do
client+=("${fields[3]}")
group+=("${fields[4]}")
done < Final
printf 'client=%s\n' "${client[#]}"
printf 'group=%s\n' "${group[#]}"
Using miller, it's no different really from a single-pass awk
solution, collecting the values in arrays:
mlr --icsv --implicit-csv-header put '
#client[NR] = $4;
#group[NR] = $5;
filter false;
end {
emit #client, "client";
emit #group, "group";
}
' Final
The equivalent of the above, as more readable (IMO) awk code:
awk -F, '
{client[NR] = $4; group[NR] = $5}
END {
for (i=1; i<=NR; i++) print "client=" client[i]
for (i=1; i<=NR; i++) print "group=" group[i]
}
' Final
Using csvtool is nice because it has a transpose function,
but it still needs help getting to the desired output
csvtool col 4,5 Final \
| csvtool cat <(echo "client,group") - \
| csvtool transpose - \
| awk -F, -v OFS="=" '{for (i=2; i<=NF; i++) print $1, $i}'

LOG="Failed,2021-12-07 22:30 EST,Scheduled Backup,abc,/clients/FORD_1030PM_EST_Windows2008,Windows File System
Failed,2021-12-07 22:00 EST,Scheduled Backup,def,/clients/FORD_10PM_EST_Windows2008,Windows File System"
getColumnTextSed(){
log="$1"
column=$2
[ -n "$log" -a -n "$column" ] && sed -E 's/([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),(.*)/\'$column'/mg;t;d' <<<$log
}
getColumnTextAWK(){
log="$1"
column=$2
[ -n "$log" -a -n "$column" ] && awk -F',' -v "c=$column" '{print $(c)}' <<<$log
}
echo "# with sed and column number"
echo "-------------------------------"
getColumnTextSed "$LOG" 4
getColumnTextSed "$LOG" 5
echo "-------------------------------"
echo "# with awk and column number"
echo "-------------------------------"
getColumnTextAWK "$LOG" 4
getColumnTextAWK "$LOG" 5
OUTPUT
# with sed
-------------------------------
abc
def
/clients/FORD_1030PM_EST_Windows2008
/clients/FORD_10PM_EST_Windows2008
-------------------------------
# with awk
-------------------------------
abc
def
/clients/FORD_1030PM_EST_Windows2008
/clients/FORD_10PM_EST_Windows2008

bash paste: loop through pairs of files based on wildcard, generate separate output files

I am trying to get the paste command to loop through pairs of files, pasting them together and outputing each as a unique file.
I've tried a lot of things, here are a few:
for i in *_temp4.csv; do paste *_temp4.csv *_temp44.csv > ${i}_out.csv; done
#Each output contains each input file (rather than pairs). Obviously this is because of the * wildcard
for i in *_temp2.csv_temp4.csv; do paste $_temp2_temp4.csv $_temp3_temp44.csv > ${i}_out.csv; done
no error, empty output files
for i in *_temp2.csv_temp4.csv; do paste ${_temp2_temp4.csv} ${_temp3_temp44.csv} > ${i}_out.csv; done
output:
combo15.awk: line 12: ${_temp2_temp4.csv}: bad substitution
I think I must be missing something very basic about how $ gets used, but I've been googling all night to no avail.
my entire code, for context, although I don't see why the previous lines should influence anything about this.
for i in *.dat; do awk 'NR > 23 { print }' ${i} > ${i}_temp1.csv; done
for i in *_temp1.csv; do awk 'BEGIN{OFS=FS=","}$2==0{$2="between"}BEGIN{OFS=FS=","}$2==1{$2="lego"}BEGIN{OFS=FS=","}$2==2{$2="pin"}BEGIN{OFS=FS=","}$2==3{$2="dice"}BEGIN{OFS=FS=","}$2==4{$2="jack"}BEGIN{OFS=FS=","}$2==8{$2="escape"}{print}' ${i} > ${i}_temp2.csv; done
for i in *_temp2.csv; do awk -v OFS="," '{$4 = $1 - prev1; prev1 = $1; print;}' ${i} > ${i}_temp3.csv; done
for i in *_temp2.csv; do awk -F "," 'BEGIN{print "new line"}{print $2}' ${i} > ${i}_temp4.csv; done
for i in *_temp3.csv; do awk -F "," '{print $5}' ${i} > ${i}_temp44.csv; done
for i in *_temp2.csv_temp4.csv; do paste $_temp2_temp4.csv $_temp3_temp44.csv > ${i}_out.csv; done

Your problem is, that names of your files grow uncontrollably. This change should solve this problem:
for i in *.dat; do awk 'NR > 23 { print }' ${i} > ${i}_temp1.csv; done
for i in *.dat; do awk 'BEGIN{OFS=FS=","}$2==0{$2="between"}BEGIN{OFS=FS=","}$2==1{$2="lego"}BEGIN{OFS=FS=","}$2==2{$2="pin"}BEGIN{OFS=FS=","}$2==3{$2="dice"}BEGIN{OFS=FS=","}$2==4{$2="jack"}BEGIN{OFS=FS=","}$2==8{$2="escape"}{print}' ${i}_temp1.csv > ${i}_temp2.csv; done
for i in *.dat; do awk -v OFS="," '{$4 = $1 - prev1; prev1 = $1; print;}' ${i}_temp2.csv > ${i}_temp3.csv; done
for i in *.dat; do awk -F "," 'BEGIN{print "new line"}{print $2}' ${i}_temp2.csv > ${i}_temp4.csv; done
for i in *.dat; do awk -F "," '{print $5}' ${i}_temp3.csv > ${i}_temp44.csv; done
for i in *.dat; do paste ${i}_temp4.csv ${i}_temp44.csv > ${i}_out.csv; done

awk working with intervals

I have this file
goodtime 20:30 21:40
badtime 19:52 24:00
and when I enter for example 21:00 and 21:15 I should get goodtime
So here's my script
#!/bin/sh
last > duom.txt
grep -F 'stud.if.ktu.lt' duom.txt > ktu.txt
echo "Nurodykite laiko intervala "
read h
read min
read h2
read min2
awk '{if ($2 ~ /$h.$m/ && $3 ~ /$h2.$min2/) print $1}' data.txt
But I don't get any results.

The problem with this:
awk '{if ($2 ~ /$h.$m/ && $3 ~ /$h2.$min2/) print $1}' data.txt
Is that you're trying to use shell variables in a single quoted string. You need to pass the shell variables into awk with its -v option:
awk -v patt1="$h.$min" -v patt2="$h2.$min2" '
$2 ~ patt1 && $3 ~ patt2 {print $1}
' data.txt
But, given your sample input, this will not match anything.
Until your requirements are clarified, I can't help with the logic.

How can I specify a row in awk in for loop?

I'm using the following awk command:
my_command | awk -F "[[:space:]]{2,}+" 'NR>1 {print $2}' | egrep "^[[:alnum:]]"
which successfully returns my data like this:
fileName1
file Name 1
file Nameone
f i l e Name 1
So as you can see some file names have spaces. This is fine as I'm just trying to echo the file name (nothing special). The problem is calling that specific row within a loop. I'm trying to do it this way:
i=1
for num in $rows
do
fileName=$(my_command | awk -F "[[:space:]]{2,}+" 'NR==$i {print $2}' | egrep "^[[:alnum:]])"
echo "$num $fileName"
$((i++))
done
But my output is always null
I've also tried using awk -v record=$i and then printing $record but I get the below results.
f i l e Name 1
EDIT
Sorry for the confusion: rows is a variable that list ids like this 11 12 13
and each one of those ids ties to a file name. My command without doing any parsing looks like this:
id File Info OS
11 File Name1 OS1
12 Fi leNa me2 OS2
13 FileName 3 OS3
I can only use the id field to run a the command that I need, but I want to use the File Info field to notify the user of the actual File that the command is being executed against.

I think your $i does not expand as expected. You should quote your arguments this way:
fileName=$(my_command | awk -F "[[:space:]]{2,}+" "NR==$i {print \$2}" | egrep "^[[:alnum:]]")
And you forgot the other ).
EDIT
As an update to your requirement you could just pass the rows to a single awk command instead of a repeatitive one inside a loop:
#!/bin/bash
ROWS=(11 12)
function my_command {
# This function just emulates my_command and should be removed later.
echo " id File Info OS
11 File Name1 OS1
12 Fi leNa me2 OS2
13 FileName 3 OS3"
}
awk -- '
BEGIN {
input = ARGV[1]
while (getline line < input) {
sub(/^ +/, "", line)
split(line, a, / +/)
for (i = 2; i < ARGC; ++i) {
if (a[1] == ARGV[i]) {
printf "%s %s\n", a[1], a[2]
break
}
}
}
exit
}
' <(my_command) "${ROWS[#]}"
That awk command could be condensed to one line as:
awk -- 'BEGIN { input = ARGV[1]; while (getline line < input) { sub(/^ +/, "", line); split(line, a, / +/); for (i = 2; i < ARGC; ++i) { if (a[1] == ARGV[i]) {; printf "%s %s\n", a[1], a[2]; break; }; }; }; exit; }' <(my_command) "${ROWS[#]}"
Or better yet just use Bash instead as a whole:
#!/bin/bash
ROWS=(11 12)
while IFS=$' ' read -r LINE; do
IFS='|' read -ra FIELDS <<< "${LINE// +( )/|}"
for R in "${ROWS[#]}"; do
if [[ ${FIELDS[0]} == "$R" ]]; then
echo "${R} ${FIELDS[1]}"
break
fi
done
done < <(my_command)
It should give an output like:
11 File Name1
12 Fi leNa me2

Shell variables aren't expanded inside single-quoted strings. Use the -v option to set an awk variable to the shell variable:
fileName=$(my_command | awk -v i=$i -F "[[:space:]]{2,}+" 'NR==i {print $2}' | egrep "^[[:alnum:]])"
This method avoids having to escape all the $ characters in the awk script, as required in konsolebox's answer.

As you already heard, you need to populate an awk variable from your shell variable to be able to use the desired value within the awk script so thi:
awk -F "[[:space:]]{2,}+" 'NR==$i {print $2}' | egrep "^[[:alnum:]]"
should be this:
awk -v i="$i" -F "[[:space:]]{2,}+" 'NR==i {print $2}' | egrep "^[[:alnum:]]"
Also, though, you don't need awk AND grep since awk can do anything grep van do so you can change this part of your script:
awk -v i="$i" -F "[[:space:]]{2,}+" 'NR==i {print $2}' | egrep "^[[:alnum:]]"
to this:
awk -v i="$i" -F "[[:space:]]{2,}+" '(NR==i) && ($2~/^[[:alnum:]]/){print $2}'
and you don't need a + after a numeric range so you can change {2,}+ to just {2,}:
awk -v i="$i" -F "[[:space:]]{2,}" '(NR==i) && ($2~/^[[:alnum:]]/){print $2}'
Most importantly, though, instead of invoking awk once for every invocation of my_command, you can just invoke it once for all of them, i.e. instead of this (assuming this does what you want):
i=1
for num in rows
do
fileName=$(my_command | awk -v i="$i" -F "[[:space:]]{2,}" '(NR==i) && ($2~/^[[:alnum:]]/){print $2}')
echo "$num $fileName"
$((i++))
done
you can do something more like this:
for num in rows
do
my_command
done |
awk -F '[[:space:]]{2,}' '$2~/^[[:alnum:]]/{print NR, $2}'
I say "something like" because you don't tell us what "my_command", "rows" or "num" are so I can't be precise but hopefully you see the pattern. If you give us more info we can provide a better answer.

It's pretty inefficient to rerun my_command (and awk) every time through the loop just to extract one line from its output. Especially when all you're doing is printing out part of each line in order. (I'm assuming that my_command really is exactly the same command and produces the same output every time through your loop.)
If that's the case, this one-liner should do the trick:
paste -d' ' <(printf '%s\n' $rows) <(my_command |
awk -F '[[:space:]]{2,}+' '($2 ~ /^[::alnum::]/) {print $2}')

How can I print the duplicates in a file only once?

I have an input file that contains:
123,apple,orange
123,pineapple,strawberry
543,grapes,orange
790,strawberry,apple
870,peach,grape
543,almond,tomato
123,orange,apple
i want the output to be:
The following numbers are repeated:
123
543
is there a way to get this output using awk; i'm writing the script in solaris , bash

sed -e 's/,/ , /g' <filename> | awk '{print $1}' | sort | uniq -d

awk -vFS=',' \
'{KEY=$1;if (KEY in KEYS) { DUPS[KEY]; }; KEYS[KEY]; } \
END{print "Repeated Keys:"; for (i in DUPS){print i} }' \
< yourfile
There are solutions with sort/uniq/cut as well (see above).

If you can live without awk, you can use this to get the repeating numbers:
cut -d, -f 1 my_file.txt | sort | uniq -d
Prints
123
543
Edit: (in response to your comment)
You can buffer the output and decide if you want to continue. For example:
out=$(cut -d, -f 1 a.txt | sort | uniq -d | tr '\n' ' ')
if [[ -n $out ]] ; then
echo "The following numbers are repeated: $out"
exit
fi
# continue...

This script will print only the number of the first column that are repeated more than once:
awk -F, '{a[$1]++}END{printf "The following numbers are repeated: ";for (i in a) if (a[i]>1) printf "%s ",i; print ""}' file
Or in a bit shorter form:
awk -F, 'BEGIN{printf "Repeated "}(a[$1]++ == 1){printf "%s ", $1}END{print ""} ' file
If you want to exit your script in case a dup is found, then you can exit a non-zero exit code. For example:
awk -F, 'a[$1]++==1{dup=1}END{if (dup) {printf "The following numbers are repeated: ";for (i in a) if (a[i]>1) printf "%s ",i; print "";exit(1)}}' file
In your main script you can do:
awk -F, 'a[$1]++==1{dup=1}END{if (dup) {printf "The following numbers are repeated: ";for (i in a) if (a[i]>1) printf "%s ",i; print "";exit(-1)}}' file || exit -1
Or in a more readable format:
awk -F, '
a[$1]++==1{
dup=1
}
END{
if (dup) {
printf "The following numbers are repeated: ";
for (i in a)
if (a[i]>1)
printf "%s ",i;
print "";
exit(-1)
}
}
' file || exit -1

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to use for i loop in search pattern in awk - bash

Just do the whole thing in 1 awk invocation: awk -F '\t' ' { split($3,arr,/\./); ++c[arr[2]] } END { for (i=2;i <= 253;i++) print c[i]+0 } ' myfile > output.txt

You can't use shell variable i directly in awk like that. Pass it to awk first: for (( i=2; i<=253; i++ )) do awk -v i=$i -F "\t" '$3 ~ "^names\." i "$" {++c} END {print c}' myfile >> output.txt done

Try this awk -F "\t" '{for (i=2;i<=253;i++) if ($3 ~ /^names.i$/) ++c} END {print c}' myfile

Related

Iterating through Comma Separated rows in loop in Shell

bash paste: loop through pairs of files based on wildcard, generate separate output files

awk working with intervals

How can I specify a row in awk in for loop?

How can I print the duplicates in a file only once?

Categories

Resources