Conditional print based on array content in bash or awk - bash

I have an input file with following contents:
SS SC
a 1
b 2
d 5
f 7
I have an input bash array as follow:
echo "${input[*]}"
a b c d e f
I need to create an output to:
1. print the all elements of the array in 1st column
2. In second column, I need to print 0 or 1, based on the presence of the element.
To explain this, in the input array called input, I have a,b,c,d,e,f. Now a is present in input file, so the output should be a 1, whereas c is missing in the input file, so the output should be c 0 in the output.
Eg: Expected result:
SS RESULT
a 1
b 1
c 0
d 1
e 0
f 1
Tried, to split the bash array in an attempt to iterate over it, but its printing for each line(the way awk works), its getting too difficult to handle.
awk -v par="${input[*]}" 'BEGIN{ n = split(par, a, " ")} {for(i=0;i<=n;i++){printf "%s\n", a[i]}}' input
I am able(missing header) to do this with bash for loop and some grep: But hoping awk would be shorter, as I need to put this in a yaml file so need to keep it short.
for item in ${input[#]};do
if ! grep -qE "^${item}" input ;then
echo "$item 0";
else
echo "$item 1";
fi;
done
a 1
b 1
c 0
d 1
e 0
f 1

Using awk to store the values in the first column of the file in an associative array and then see if the elements of the array exist in it:
#!/usr/bin/env bash
input=(a b c d e f)
awk 'BEGIN { print "SS", "RESULT" }
FNR == NR { vals[$1] = 1; next }
{ print $0, $0 in vals }
' input.txt <(printf "%s\n" "${input[#]}")
Or doing the the same thing in pure bash:
#!/usr/bin/env bash
input=(a b c d e f)
declare -A vals
while read -r v _; do
vals[$v]=1
done < input.txt
echo "SS RESULT"
for v in "${input[#]}"; do
if [[ -v vals[$v] ]]; then
printf "%s 1\n" "$v"
else
printf "%s 0\n" "$v"
fi
done

Following code snippet demonstrates how it can be achieved in perl
use strict;
use warnings;
use feature 'say';
my #array = qw/a b c d e f/;
my %seen;
$seen{(split)[0]}++ while <DATA>;
say 'SS RESULT';
say $_, ' ', $seen{$_} ? 1 : 0 for #array;
__DATA__
SS SC
a 1
b 2
d 5
f 7
Output
SS RESULT
a 1
b 1
c 0
d 1
e 0
f 1

Related

Skip lines starting with a character and delete lines matching second column lesser than a value

I have file with following format :
Qil
Lop
A D E
a 1 10
b 2 21
c 3 22
d 4 5
3 5 9
I need to skip lines that start with pattern 'Qil' or 'Lop' or 'A D E' and ones where the third column has a value greater than 10 and save the entire thing in 2 different files with formats as shown below.
Example output files :
Output file 1
Qil
Lop
A D E
a 1 10
d 4 5
3 5 9
Output file 2
a
d
3
My code :
while read -r line; if [[ $line == "A" ]] ||[[ $line == "Q" ]]||[[ $line == "L" ]] ; then
awk '$2 < "11" { print $0 }' test.txt
awk '$2 < "11" { print $1 }' test1.txt
done < input.file
Could you please try following.
awk '
/^Qil$|^Lop$|^A D E$/{
val=(val?val ORS:"")$0
next
}
$3<=10{
if(!flag){
print val > "file1"
flag=1
}
print > "file1"
if(!a[$1]++){
print $1> "file2"
}
}' Input_file
This will create 2 output files named file1 and file2 as per OP's requirements.
This can be done in a single awk:
awk '$1 !~ /^[QLA]/ && $2 <= 10' file
1 10
4 5
5 9
If you want to print only first column then use:
awk '$1 !~ /^[QLA]/ && $2 <= 10 { print $1 }' file
1
4
5

UPDATED: Bash + Awk : Print first X(dynamic) columns and always last column

#file test.txt
a b c 5
d e f g h 7
gg jj 2
Say X = 3 I need the output like this:
#file out.txt
a b c 5
d e f 7
gg jj 2
NOT this:
a b c 5
d e f 7
gg jj 2 2 <--- WRONG
I've gotten to this stage:
cat test.txt | awk ' { print $1" "$2" "$3" "NF } '
If you're unsure of the total number of fields, then one option would be to use a loop:
awk '{ for (i = 1; i <= 3 && i < NF; ++i) printf "%s ", $i; print $NF }' file
The loop can be avoided by using a ternary:
awk '{ print $1, $2, (NF > 3 ? $3 OFS $NF : $3) }' file
This is slightly more verbose than the approach suggested by 123 but means that you aren't left with trailing white space on the lines with three fields. OFS is the Output Field Separator, a space by default, which is what print inserts between fields when you use a ,.
Use a $ combined with NF :
cat test.txt | awk ' { print $1" "$2" "$3" "$NF } '

Find a number of a file in a range of numbers of another file

I have this two input files:
file1
1 982444
1 46658343
3 15498261
2 238295146
21 47423507
X 110961739
17 7490379
13 31850803
13 31850989
file2
1 982400 982480
1 46658345 46658350
2 14 109
2 5000 9000
2 238295000 238295560
X 110961739 120000000
17 7490200 8900005
And this is my desired output:
Desired output:
1 982444
2 238295146
X 110961739
17 7490379
This is what I want: Find the column 1 element of file1 in column 1 of file2. If the number is the same, take the number of column 2 of file1 and check if it is included in the range of numbers of column2 and 3 of file2. If it is included, print the line of file1 in the output.
Maybe is a little confusing to understand, but I'm doing my best. I have tried some things but I'm far away from the solution and any help will be really appreciated. In bash, awk or perl please.
Thanks in advance,
Just using awk. The solution doesn't loop through file1 repeatedly.
#!/usr/bin/awk -f
NR == FNR {
# I'm processing file2 since NR still matches FNR
# I'd store the ranges from it on a[] and b[]
# x[] acts as a counter to the number of range pairs stored that's specific to $1
i = ++x[$1]
a[$1, i] = $2
b[$1, i] = $3
# Skip to next record; Do not allow the next block to process a record from file2.
next
}
{
# I'm processing file1 since NR is already greater than FNR
# Let's get the index for the last range first then go down until we reach 0.
# Nothing would happen as well if i evaluates to nothing i.e. $1 doesn't have a range for it.
for (i = x[$1]; i; --i) {
if ($2 >= a[$1, i] && $2 <= b[$1, i]) {
# I find that $2 is within range. Now print it.
print
# We're done so let's skip to the next record.
next
}
}
}
Usage:
awk -f script.awk file2 file1
Output:
1 982444
2 238295146
X 110961739
17 7490379
A similar approach using Bash (version 4.0 or newer):
#!/bin/bash
FILE1=$1 FILE2=$2
declare -A A B X
while read F1 F2 F3; do
(( I = ++X[$F1] ))
A["$F1|$I"]=$F2
B["$F1|$I"]=$F3
done < "$FILE2"
while read -r LINE; do
read F1 F2 <<< "$LINE"
for (( I = X[$F1]; I; --I )); do
if (( F2 >= A["$F1|$I"] && F2 <= B["$F1|$I"] )); then
echo "$LINE"
continue
fi
done
done < "$FILE1"
Usage:
bash script.sh file1 file2
Let's mix bash and awk:
while read col min max
do
awk -v col=$col -v min=$min -v max=$max '$1==col && min<=$2 && $2<=max' f1
done < f2
Explanation
For each line of file2, read the min and the max, together with the value of the first column.
Given these values, check in file1 for those lines having same first column and being 2nd column in the range specified by file 2.
Test
$ while read col min max; do awk -v col=$col -v min=$min -v max=$max '$1==col && min<=$2 && $2<=max' f1; done < f2
1 982444
2 238295146
X 110961739
17 7490379
Pure bash , based on Fedorqui solution:
#!/bin/bash
while read col_2 min max
do
while read col_1 val
do
(( col_1 == col_2 && ( min <= val && val <= max ) )) && echo $col_1 $val
done < file1
done < file2
cut -d' ' -f1 input2 | sed 's/^/^/;s/$/\\s/' | \
grep -f - <(cat input2 input1) | sort -n -k1 -k3 | \
awk 'NF==3 {
split(a,b,",");
for (v in b)
if ($2 <= b[v] && $3 >= b[v])
print $1, b[v];
if ($1 != p) a=""}
NF==2 {p=$1;a=a","$2}'
Produces:
X 110961739
1 982444
2 238295146
17 7490379
Here's a Perl solution. It could be much faster but less concise if I built a hash out of file2, but this should be fine.
use strict;
use warnings;
use autodie;
my #bounds = do {
open my $fh, '<', 'file2';
map [ split ], <$fh>;
};
open my $fh, '<', 'file1';
while (my $line = <$fh>) {
my ($key, $val) = split ' ', $line;
for my $bound (#bounds) {
next unless $key eq $bound->[0] and $val >= $bound->[1] and $val <= $bound->[2];
print $line;
last;
}
}
output
1 982444
2 238295146
X 110961739
17 7490379

Search for a column by name in awk

I have a file that has many columns. Let us say "Employee_number" "Employee_name" "Salary". I want to display all entries in a column by giving all or part of the column name. For example if my input "name" I want all the employee names printed. Is it possible to do this in a simple manner using awk?
Thanks
Given a script getcol.awk as follows:
BEGIN {
colname = ARGV[1]
ARGV[1] = ""
getline
for (i = 1; i <= NF; i++) {
if ($i ~ colname) {
break;
}
}
if (i > NF) exit
}
{print $i}
... and the input file test.txt:
apple banana candy deer elephant
A B C D E
A B C D E
A B C D E
A B C D E
A B C D E
A B C D E
A B C D E
... the command:
$ awk -f getcol.awk b <test.txt
... gives the following output:
B
B
B
B
B
B
B
Note that the output text does not include the first line of the test file, which is treated as a header.
Simple one-liner will do the trick:
$ cat file
a b c
1 2 3
1 2 3
1 2 3
$ awk -v c="a" 'NR==1{for(i=1;i<=NF;i++)n=$i~c?i:n;next}n{print $n}' file
1
1
1
$ awk -v c="b" 'NR==1{for(i=1;i<=NF;i++)n=$i~c?i:n;next}n{print $n}' file
2
2
2
$ awk -v c="c" 'NR==1{for(i=1;i<=NF;i++)n=$i~c?i:n;next}n{print $n}' file
3
3
3
# no column d so no output
$ awk -v c="d" 'NR==1{for(i=1;i<=NF;i++)n=$i~c?i:n;next}n{print $n}' file
Note: as in your requirement you want name to match employee_name just be aware if you give employee you will get the last column matching employee this is easily changed however.

how can I send parameter to awk using shell script

I have this file
myfile
a b c d e 1
b c s d e 1
a b d e f 2
d f g h j 2
awk 'if $6==$variable {print #0}' myfile
How can I use this code in shell script that get $variable as parameter by user in command prompt?
You can use awk's -v flag. And since awk prints by default, you can try for example:
variable=1
awk -v var=$variable '$6 == var' file.txt
Results:
a b c d e 1
b c s d e 1
EDIT:
The command is essentially the same, wrapped up in shell. You can use it in a shell script with multiple arguments like this script.sh 2 j
Contents of script.sh:
command=$(awk -v var_one=$1 -v var_two=$2 '$6 == var_one && $5 == var_two' file.txt)
echo -e "$command"
Results:
d f g h j 2
This is question 24 in the comp.unix.shell FAQ (http://cfajohnson.com/shell/cus-faq-2.html#Q24) but the most commonly used alternatives with the most common reasons to pick between the 2 are:
-v var=value '<script>' file1 file2
if you want the variable to be populated in the BEGIN section
or:
'<script>' file1 var=value file2
if you do not want the variable to be populated in the BEGIN section and/or need to change the variables value between files

Resources