How can I specify a row in awk in for loop? - bash

I'm using the following awk command:
my_command | awk -F "[[:space:]]{2,}+" 'NR>1 {print $2}' | egrep "^[[:alnum:]]"
which successfully returns my data like this:
fileName1
file Name 1
file Nameone
f i l e Name 1
So as you can see some file names have spaces. This is fine as I'm just trying to echo the file name (nothing special). The problem is calling that specific row within a loop. I'm trying to do it this way:
i=1
for num in $rows
do
fileName=$(my_command | awk -F "[[:space:]]{2,}+" 'NR==$i {print $2}' | egrep "^[[:alnum:]])"
echo "$num $fileName"
$((i++))
done
But my output is always null
I've also tried using awk -v record=$i and then printing $record but I get the below results.
f i l e Name 1
EDIT
Sorry for the confusion: rows is a variable that list ids like this 11 12 13
and each one of those ids ties to a file name. My command without doing any parsing looks like this:
id File Info OS
11 File Name1 OS1
12 Fi leNa me2 OS2
13 FileName 3 OS3
I can only use the id field to run a the command that I need, but I want to use the File Info field to notify the user of the actual File that the command is being executed against.

I think your $i does not expand as expected. You should quote your arguments this way:
fileName=$(my_command | awk -F "[[:space:]]{2,}+" "NR==$i {print \$2}" | egrep "^[[:alnum:]]")
And you forgot the other ).
EDIT
As an update to your requirement you could just pass the rows to a single awk command instead of a repeatitive one inside a loop:
#!/bin/bash
ROWS=(11 12)
function my_command {
# This function just emulates my_command and should be removed later.
echo " id File Info OS
11 File Name1 OS1
12 Fi leNa me2 OS2
13 FileName 3 OS3"
}
awk -- '
BEGIN {
input = ARGV[1]
while (getline line < input) {
sub(/^ +/, "", line)
split(line, a, / +/)
for (i = 2; i < ARGC; ++i) {
if (a[1] == ARGV[i]) {
printf "%s %s\n", a[1], a[2]
break
}
}
}
exit
}
' <(my_command) "${ROWS[#]}"
That awk command could be condensed to one line as:
awk -- 'BEGIN { input = ARGV[1]; while (getline line < input) { sub(/^ +/, "", line); split(line, a, / +/); for (i = 2; i < ARGC; ++i) { if (a[1] == ARGV[i]) {; printf "%s %s\n", a[1], a[2]; break; }; }; }; exit; }' <(my_command) "${ROWS[#]}"
Or better yet just use Bash instead as a whole:
#!/bin/bash
ROWS=(11 12)
while IFS=$' ' read -r LINE; do
IFS='|' read -ra FIELDS <<< "${LINE// +( )/|}"
for R in "${ROWS[#]}"; do
if [[ ${FIELDS[0]} == "$R" ]]; then
echo "${R} ${FIELDS[1]}"
break
fi
done
done < <(my_command)
It should give an output like:
11 File Name1
12 Fi leNa me2

Shell variables aren't expanded inside single-quoted strings. Use the -v option to set an awk variable to the shell variable:
fileName=$(my_command | awk -v i=$i -F "[[:space:]]{2,}+" 'NR==i {print $2}' | egrep "^[[:alnum:]])"
This method avoids having to escape all the $ characters in the awk script, as required in konsolebox's answer.

As you already heard, you need to populate an awk variable from your shell variable to be able to use the desired value within the awk script so thi:
awk -F "[[:space:]]{2,}+" 'NR==$i {print $2}' | egrep "^[[:alnum:]]"
should be this:
awk -v i="$i" -F "[[:space:]]{2,}+" 'NR==i {print $2}' | egrep "^[[:alnum:]]"
Also, though, you don't need awk AND grep since awk can do anything grep van do so you can change this part of your script:
awk -v i="$i" -F "[[:space:]]{2,}+" 'NR==i {print $2}' | egrep "^[[:alnum:]]"
to this:
awk -v i="$i" -F "[[:space:]]{2,}+" '(NR==i) && ($2~/^[[:alnum:]]/){print $2}'
and you don't need a + after a numeric range so you can change {2,}+ to just {2,}:
awk -v i="$i" -F "[[:space:]]{2,}" '(NR==i) && ($2~/^[[:alnum:]]/){print $2}'
Most importantly, though, instead of invoking awk once for every invocation of my_command, you can just invoke it once for all of them, i.e. instead of this (assuming this does what you want):
i=1
for num in rows
do
fileName=$(my_command | awk -v i="$i" -F "[[:space:]]{2,}" '(NR==i) && ($2~/^[[:alnum:]]/){print $2}')
echo "$num $fileName"
$((i++))
done
you can do something more like this:
for num in rows
do
my_command
done |
awk -F '[[:space:]]{2,}' '$2~/^[[:alnum:]]/{print NR, $2}'
I say "something like" because you don't tell us what "my_command", "rows" or "num" are so I can't be precise but hopefully you see the pattern. If you give us more info we can provide a better answer.

It's pretty inefficient to rerun my_command (and awk) every time through the loop just to extract one line from its output. Especially when all you're doing is printing out part of each line in order. (I'm assuming that my_command really is exactly the same command and produces the same output every time through your loop.)
If that's the case, this one-liner should do the trick:
paste -d' ' <(printf '%s\n' $rows) <(my_command |
awk -F '[[:space:]]{2,}+' '($2 ~ /^[::alnum::]/) {print $2}')

Related

If else script in bash using grep and awk

I am trying to make a script to check if the value snp (column $4 of the test file) is present in another file (map file). If so, print the value snp and the value distance taken from the map file (distance is the column $4 of the map file). If the snp value from the test file is not present in the map file, print the snp value but put a 0 (zero) in the second column as distance value.
The script is:
for chr in {1..22};
do
for snp in awk '{print $4}' test$chr.bim
i=$(grep $snp map$chr.txt | wc -l | awk '{print $1}')
if [[ $i == "0" ]]
then
echo "$snp 0" >> position.$chr
else
distance=$(grep $snp map$chr.txt | awk '{print $4}')
echo "$snp $distance" >> position.$chr
fi
done
done
my map file is made like this:
Chromosome Position(bp) Rate(cM/Mb) Map(cM)
chr22 16051347 8.096992 0.000000
chr22 16052618 8.131520 0.010291
chr22 16053624 8.131967 0.018472
and so on..
my test file is made like this:
22 16051347 0 16051347 C A
22 16052618 0 16052618 G T
22 17306184 0 17306184 T G
and so on..
I'm getting the following syntax errors:
position.sh: line 6: syntax error near unexpected token `i=$(grep $snp map$chr.txt | wc -l | awk '{print $1}')'
position.sh: line 6: `i=$(grep $snp map$chr.txt | wc -l | awk '{print $1}')'
Any tip?
The attempt to use awk as the argument to for is basically a syntax error, and you have a number of syntax problems and inefficiencies here.
Try this:
for chr in {1..22}; do
awk '{print $4}' "test$chr.bim" |
while IFS="" read -r snp; do
if ! grep -q "$snp" "map$chr.txt"; then
echo "$snp 0"
else
awk -v snp="$snp" '
$0 ~ snp { print snp, $4 }' "map$chr.txt"
fi >> "position.$chr"
done
done
The entire thing could probably be further refactored to a single Awk script.
for chr in {1..22}; do
awk 'NR == FNR { ++a[$4]; next }
$2 in a { print a[$2], $4; ++found[$2] }
END { for(k in a) if (!found[k]) print a[k], 0 }' \
"test$chr.bim" "map$chr.txt" >> "position.$chr"
done
The correct for syntax for what I'm guessing you wanted would look like
for snp in $(awk '{print $4}' "test$chr.bim"); do
but this has other problems; see don't read lines with for

Filter lines based on certain string and then print only some attributes greater

I have a big text file with million of log lines.
I would like to filter all the lines which satisfy following criteria
url should be url=/v2/testB
totalTime value should be greater than 500
INFO|id=1|totaltime=5000|httpmethod=POST|url=/v1/testA
INFO|id=2|totaltime=200|httpmethod=POST|url=/v2/testB
INFO|id=3|totaltime=1000|httpmethod=POST|url=/v2/testB
INFO|id=4|totaltime=501|httpmethod=POST|url=/v2/testB
result:-
id=3,totaltime=1000
id=4,totaltime=501
I have tried using multiple awk and then putting if block, I wonder, it can be done quickly? Thanks !
while IFS= read -r line; do
value=`echo $line|grep "url=/v2/testB" | awk -F"totaltime=" '{ print $2}'| awk -F"|" '{ print $1}'`
if (( $value > 500 )); then
echo $line
fi
done < file.log
You may use this awk:
awk -F '|' -v OFS=, '$NF == "url=/v2/testB" {v=$3; sub(/^totaltime=/, "", v); if (v+0 > 500) print $2, $3}' file
id=3,totaltime=1000
id=4,totaltime=501
To make it more readable:
awk -F '|' -v OFS=, '
$NF == "url=/v2/testB" {
v = $3
sub(/^totaltime=/, "", v)
if (v+0 > 500)
print $2, $3
}' file
If you have gnu-awk then it can be reduced to:
awk -F '|' -v OFS=, '$NF == "url=/v2/testB" &&
gensub(/^totaltime=/, "", "1", $3)+0 > 500 {print $2, $3}' file
v+0 is shorthand in awk to covert a string value to number.
$ awk -F'|' -v OFS=',' '{split($3,t,/=/)} $5=="url=/v2/testB" && t[2]>500{print $2, $3}' file
id=3,totaltime=1000
id=4,totaltime=501
You seem to be in luck:
awk -F'|' 'BEGIN{FS="|"; OFS=","}
{ url = substr($NF,index($NF,"=")+1)
totaltime = substr($3,index($3,"=")+1)
}
(url == "/v1/testB") && (totaltime+0 > 500) { print $2,$3 }
' file
With your shown samples, please try following awk program.
awk -F'\\||totaltime=' '$NF=="url=/v2/testB" && $4>500{print $2",totaltime="$4}' Input_file
Explanation: Following is the detailed explanation for above code.
Setting field separator by using -F option in awk program.
Setting field separators to | and totaltime= for all the lines of Input_file.
In main program, checking conditions:
a- If $NF(last field) is equal to url=/v2/testB AND
b- 4th field is greater than 500 then do:
print 2nd field of current line followed by string ,totaltime= followed by 4th field as per required output by OP.
All the awk solutions are great, and if that is a solution use them.
If you wanted to fix your Bash effort, you can do:
while IFS='|' read -r id ti; do
[[ "${ti#*=}" -gt 500 ]] && printf "%s,%s\n" "$id" "$ti"
done < <(grep 'url=/v2/testB$' file | cut -d '|' -f 2,3)
Alternatively, you can eliminate cut and keep all five fields:
while IFS='|' read -r c1 c2 c3 c4 c5; do
[[ "${c3#*=}" -gt 500 ]] && printf "%s,%s\n" "$c2" "$c5"
done < <(grep 'url=/v2/testB$' file)
Either prints:
id=3,totaltime=1000
id=4,totaltime=501

Shell awk - Print a position from variable

Here is my string that needs to be parsed.
line='aaa vvv ccc'
I need to print the values one by one.
no_of_users=$(echo $line| wc -w)
If the no_of_users is greater than 1 then I need to print the values one by one.
aaa
vvv
ccc
I used this script.
if [ $no_of_users -gt 1 ]
then
for ((n=1;n<=$no_of_users;n++))
do
-- here is my issue ##echo 'user:'$n $line|awk -F ' ' -vno="${n}" 'BEGIN { print no }'
done
fi
In the { print no } I have to print the value in that position.
You may use this awk:
awk 'NF>1 {OFS="\n"; $1=$1} 1' <<< "$line"
aaa
vvv
ccc
What it does:
NF>1: If number of fields are greater than 1
OFS="\n": Set output field separator to \n
$1=$1: Force restructure of a record
1: Print a record
1st solution: Within single awk could you please try following. Where var is an awk variable which has shell variable line value in it.
awk -v var="$line" '
BEGIN{
num=split(var,arr," ")
if(num>1){
for(i=1;i<=num;i++){ print arr[i] }
}
}'
Explanation: Adding detailed explanation for above.
awk -v var="$line" ' ##Starting awk program and creating var variable which has line shell variable value in it.
BEGIN{ ##Starting BEGIN section of program from here.
num=split(var,arr," ") ##Splitting var into array arr here. Saving its total length into variable num to check it later.
if(num>1){ ##Checking condition if num is greater than 1 then do following.
for(i=1;i<=num;i++){ print arr[i] } ##Running for loop from i=1 to till value of num here and printing arr value with index i here.
}
}'
2nd solution: Adding one more solution tested and written in GNU awk.
echo "$line" | awk -v RS= -v OFS="\n" 'NF>1{$1=$1;print}'
Another option:
if [ $no_of_users -gt 1 ]
then
for ((n=1;n<=$no_of_users;n++))
do
echo 'user:'$n $(echo $line|awk -F ' ' -v x=$n '{printf $x }')
done
fi
You can use grep
echo $line | grep -o '[a-z][a-z]*'
Also with awk:
awk '{print $1, $2, $3}' OFS='\n' <<< "$line"
aaa
vvv
ccc
the key is setting OFS='\n'
Or a really toughie:
printf "%s\n" $line
(note: $line is unquoted)
printf will consume all words in line with word-splitting applied so each word is taken as a single input.
Example Use/Output
$ line='aaa vvv ccc'; printf "%s\n" $line
aaa
vvv
ccc
Using bash:
$ line='aaa vvv'ccc'
$ [[ $line =~ \ ]] && echo -e ${line// /\\n}
aaa
vvv
ccc
$ line=aaa
$ [[ $line =~ \ ]] && echo -e ${line// /\\n}
$
If you are on another shell:
$ line="foo bar baz" bash -c '[[ $line =~ \ ]] && echo -e ${line// /\\n}'
grep -Eq '[[:space:]]' <<< "$line" && xargs printf "%s\n" <<< $line
Do a silent grep for a space in the variable, if true, print with names on separate lines.
awk -v OFS='\n' 'NF>1{$1=$1; print}'
e.g.
$ line='aaa vvv ccc'
$ echo "$line" | awk -v OFS='\n' 'NF>1{$1=$1; print}'
aaa
vvv
ccc
$ line='aaa'
$ echo "$line" | awk -v OFS='\n' 'NF>1{$1=$1; print}'
$
another golfed awk variation
$ awk 'gsub(FS,RS)'
only print if there is a substitution.

Shell Script for combining 3 files

I have 3 files with below data
$cat File1.txt
Apple,May
Orange,June
Mango,July
$cat File2.txt
Apple,Jan
Grapes,June
$cat File3.txt
Apple,March
Mango,Feb
Banana,Dec
I require the below output file.
$Output_file.txt
Apple,May|Jan|March
Orange,June
Mango,July|Feb
Grapes,June
Banana,Dec
Requirement here is the take out the first column and then common data in column 1 in each file need to be searched and second column needs to be "|" separated. If there is no common column, then same needs to be printed in the output file.
I have tried putting this in a while loop, but it takes time as the file size increase. Wanted a simple solution using shell script.
This should work :
#!/bin/bash
for FRUIT in $( cat "$#" | cut -d "," -f 1 | sort | uniq )
do
echo -ne "${FRUIT},"
awk -F "," "\$1 == \"$FRUIT\" {printf(\"%s|\",\$2)}" "$#" | sed 's/.$/\'$'\n/'
done
Run it as :
$ ./script.sh File1.txt File2.txt File3.txt
A purely native-bash solution (calling no external tools, and thus limited only by the performance constraints of bash itself) might look like:
#!/usr/bin/env bash
case $BASH_VERSION in ''|[123].*) echo "ERROR: Bash 4 or newer required" >&2; exit 1;; esac
declare -A items=( )
for file in "$#"; do
while IFS=, read -r key value; do
items[$key]+="|$value"
done <"$file"
done
for key in "${!items[#]}"; do
value=${items[$key]}
printf '%s,%s\n' "$key" "${value#'|'}"
done
...called as ./yourscript File1.txt File2.txt File3.txt
This is fairly easy done with a single awk command:
awk 'BEGIN{FS=OFS=","} {a[$1] = a[$1] (a[$1] == "" ? "" : "|") $2}
END {for (i in a) print i, a[i]}' File{1,2,3}.txt
Orange,June
Banana,Dec
Apple,May|Jan|March
Grapes,June
Mango,July|Feb
If you want output in the same order as strings appear in original files then use this awk:
awk 'BEGIN{FS=OFS=","} !($1 in a) {b[++n] = $1}
{a[$1] = a[$1] (a[$1] == "" ? "" : "|") $2}
END {for (i=1; i<=n; i++) print b[i], a[b[i]]}' File{1,2,3}.txt
Apple,May|Jan|March
Orange,June
Mango,July|Feb
Grapes,June
Banana,Dec

How can I print the duplicates in a file only once?

I have an input file that contains:
123,apple,orange
123,pineapple,strawberry
543,grapes,orange
790,strawberry,apple
870,peach,grape
543,almond,tomato
123,orange,apple
i want the output to be:
The following numbers are repeated:
123
543
is there a way to get this output using awk; i'm writing the script in solaris , bash
sed -e 's/,/ , /g' <filename> | awk '{print $1}' | sort | uniq -d
awk -vFS=',' \
'{KEY=$1;if (KEY in KEYS) { DUPS[KEY]; }; KEYS[KEY]; } \
END{print "Repeated Keys:"; for (i in DUPS){print i} }' \
< yourfile
There are solutions with sort/uniq/cut as well (see above).
If you can live without awk, you can use this to get the repeating numbers:
cut -d, -f 1 my_file.txt | sort | uniq -d
Prints
123
543
Edit: (in response to your comment)
You can buffer the output and decide if you want to continue. For example:
out=$(cut -d, -f 1 a.txt | sort | uniq -d | tr '\n' ' ')
if [[ -n $out ]] ; then
echo "The following numbers are repeated: $out"
exit
fi
# continue...
This script will print only the number of the first column that are repeated more than once:
awk -F, '{a[$1]++}END{printf "The following numbers are repeated: ";for (i in a) if (a[i]>1) printf "%s ",i; print ""}' file
Or in a bit shorter form:
awk -F, 'BEGIN{printf "Repeated "}(a[$1]++ == 1){printf "%s ", $1}END{print ""} ' file
If you want to exit your script in case a dup is found, then you can exit a non-zero exit code. For example:
awk -F, 'a[$1]++==1{dup=1}END{if (dup) {printf "The following numbers are repeated: ";for (i in a) if (a[i]>1) printf "%s ",i; print "";exit(1)}}' file
In your main script you can do:
awk -F, 'a[$1]++==1{dup=1}END{if (dup) {printf "The following numbers are repeated: ";for (i in a) if (a[i]>1) printf "%s ",i; print "";exit(-1)}}' file || exit -1
Or in a more readable format:
awk -F, '
a[$1]++==1{
dup=1
}
END{
if (dup) {
printf "The following numbers are repeated: ";
for (i in a)
if (a[i]>1)
printf "%s ",i;
print "";
exit(-1)
}
}
' file || exit -1

Resources