Search for a column by name in awk - shell

I have a file that has many columns. Let us say "Employee_number" "Employee_name" "Salary". I want to display all entries in a column by giving all or part of the column name. For example if my input "name" I want all the employee names printed. Is it possible to do this in a simple manner using awk?
Thanks

Given a script getcol.awk as follows:
BEGIN {
colname = ARGV[1]
ARGV[1] = ""
getline
for (i = 1; i <= NF; i++) {
if ($i ~ colname) {
break;
}
}
if (i > NF) exit
}
{print $i}
... and the input file test.txt:
apple banana candy deer elephant
A B C D E
A B C D E
A B C D E
A B C D E
A B C D E
A B C D E
A B C D E
... the command:
$ awk -f getcol.awk b <test.txt
... gives the following output:
B
B
B
B
B
B
B
Note that the output text does not include the first line of the test file, which is treated as a header.

Simple one-liner will do the trick:
$ cat file
a b c
1 2 3
1 2 3
1 2 3
$ awk -v c="a" 'NR==1{for(i=1;i<=NF;i++)n=$i~c?i:n;next}n{print $n}' file
1
1
1
$ awk -v c="b" 'NR==1{for(i=1;i<=NF;i++)n=$i~c?i:n;next}n{print $n}' file
2
2
2
$ awk -v c="c" 'NR==1{for(i=1;i<=NF;i++)n=$i~c?i:n;next}n{print $n}' file
3
3
3
# no column d so no output
$ awk -v c="d" 'NR==1{for(i=1;i<=NF;i++)n=$i~c?i:n;next}n{print $n}' file
Note: as in your requirement you want name to match employee_name just be aware if you give employee you will get the last column matching employee this is easily changed however.

Related

Compare names and numbers in two files and output more

For example, there are 2 files:
$ cat file1.txt
e 16
a 9
c 14
b 9
f 25
g 7
$ cat file2.txt
a 10
b 12
c 15
e 8
g 7
Сomparing these two files with the command(directory dir 1 contains file 1, in directory 2 respectively file 2) grep -xvFf "$dir2" "$dir1" | tee "$dir3" we get the following output in dir 3:
$ cat file3.txt
e 16
a 9
c 14
b 9
f 25
Now I need to essentially compare the output of file 3 and file 2 and output to file 3 only those results where the number next to the letter has become greater, if the number is equal to or less than the value in file 2, do not output these values to the 3rd file, that is the contents of file 3 should be like this:
$ cat file3.txt
e 16
f 25
{m,g}awk 'FNR < NR ? __[$!_]<+$NF : (__[$!_]=+$NF)<_' f2.txt f1.txt
e 16
f 25
if u really wanna clump it all in one shot :
mawk '(__[$!!(_=NF)]+= $_ * (NR==FNR)) < +$_' f2.txt f1.txt
One awk idea:
awk '
FNR==NR { a[$1]=$2; next } # 1st file: save line in array a[]
($1 in a) && ($2 > a[$1]) # 2nd file: print current line if 1st field is an index in array a[] *AND* 2nd field is greater than the corrsponding value from array a[]
!($1 in a) # 2nd file: print current line if 1st field is not an index in array a[]
' file2.txt file1.txt
This generates:
e 16
f 25

Conditional print based on array content in bash or awk

I have an input file with following contents:
SS SC
a 1
b 2
d 5
f 7
I have an input bash array as follow:
echo "${input[*]}"
a b c d e f
I need to create an output to:
1. print the all elements of the array in 1st column
2. In second column, I need to print 0 or 1, based on the presence of the element.
To explain this, in the input array called input, I have a,b,c,d,e,f. Now a is present in input file, so the output should be a 1, whereas c is missing in the input file, so the output should be c 0 in the output.
Eg: Expected result:
SS RESULT
a 1
b 1
c 0
d 1
e 0
f 1
Tried, to split the bash array in an attempt to iterate over it, but its printing for each line(the way awk works), its getting too difficult to handle.
awk -v par="${input[*]}" 'BEGIN{ n = split(par, a, " ")} {for(i=0;i<=n;i++){printf "%s\n", a[i]}}' input
I am able(missing header) to do this with bash for loop and some grep: But hoping awk would be shorter, as I need to put this in a yaml file so need to keep it short.
for item in ${input[#]};do
if ! grep -qE "^${item}" input ;then
echo "$item 0";
else
echo "$item 1";
fi;
done
a 1
b 1
c 0
d 1
e 0
f 1
Using awk to store the values in the first column of the file in an associative array and then see if the elements of the array exist in it:
#!/usr/bin/env bash
input=(a b c d e f)
awk 'BEGIN { print "SS", "RESULT" }
FNR == NR { vals[$1] = 1; next }
{ print $0, $0 in vals }
' input.txt <(printf "%s\n" "${input[#]}")
Or doing the the same thing in pure bash:
#!/usr/bin/env bash
input=(a b c d e f)
declare -A vals
while read -r v _; do
vals[$v]=1
done < input.txt
echo "SS RESULT"
for v in "${input[#]}"; do
if [[ -v vals[$v] ]]; then
printf "%s 1\n" "$v"
else
printf "%s 0\n" "$v"
fi
done
Following code snippet demonstrates how it can be achieved in perl
use strict;
use warnings;
use feature 'say';
my #array = qw/a b c d e f/;
my %seen;
$seen{(split)[0]}++ while <DATA>;
say 'SS RESULT';
say $_, ' ', $seen{$_} ? 1 : 0 for #array;
__DATA__
SS SC
a 1
b 2
d 5
f 7
Output
SS RESULT
a 1
b 1
c 0
d 1
e 0
f 1

Linux Bash count and summarize by unique columns

I have a text file with lines like this (in Linux Bash):
A B C D
A B C J
E B C P
E F G N
E F G P
A B C Q
H F S L
G Y F Q
H F S L
I need to find the lines with unique values for the first 3 columns, print their count and then print summarized last column for each unique line, so the result is like this:
3 A B C D,J,Q
1 E B C P
2 E F G N,P
1 G Y F Q
2 H F S L
What I have tried:
cat FILE | sort -k1,3 | uniq -f3 -c | sort -k3,5nr
Is there maybe any advice?
Thanks in advance!
The easiest is to do the following:
awk '{key=$1 OFS $2 OFS $3; a[key]=a[key]","$4; c[key]++}
END{for(key in a) { print c[key],key,substr(a[key],2) }}' <file>
If you do not want any duplication, you can do
awk '{ key=$1 OFS $2 OFS $3; c[key]++ }
!gsub(","$4,","$4,a[key]) {a[key]=a[key]","$4; }
END{for(key in a) { print c[key],key,substr(a[key],2) }} <file>
Could you please try following and let me know if this helps you.
This will give you output in same sequence as per Input_file's $1, $2, and $3 occurrence only.
awk '
!a[$1,$2,$3]++{
b[++count]=$1 FS $2 FS $3
}
{
c[$1,$2,$3]=c[$1,$2,$3]?c[$1,$2,$3] "," $4:$0
d[$1 FS $2 FS $3]++
}
END{
for(i=1;i<=count;i++){
print d[b[i]],c[b[i]]
}
}
' SUBSEP=" " Input_file
Another using GNU awk and 2d arrays for removing duplicates in $4:
$ awk '{
i=$1 OFS $2 OFS $3 # key to hash
a[i][$4] # store each $4 to separate element
c[i]++ # count key references
}
END {
for(i in a) {
k=1 # comma counter for output
printf "%s %s ",c[i],i # output count and key
for(j in a[i]) # each a[]i[j] element
printf "%s%s",((k++)==1?"":","),j # output commas and elements
print "" # line-ending
}
}' file
Output in default random order:
2 E F G N,P
3 A B C Q,D,J
1 G Y F Q
1 E B C P
2 H F S L
Since we are using GNU awk the order of output could be affected easily by setting PROCINFO["sorted_in"]="#ind_str_asc":
3 A B C D,J,Q
1 E B C P
2 E F G N,P
1 G Y F Q
2 H F S L
You could utilize GNU datamash:
$ cat input
A B C D
A B C J
E B C P
E F G N
E F G P
A B C Q
H F S L
G Y F Q
H F S L
$ datamash -t' ' --sort groupby 1,2,3 unique 4 count 4 < input
A B C D,J,Q 3
E B C P 1
E F G N,P 2
G Y F Q 1
H F S L 2
This unfortunately outputs the count as the last column. If it is absolutely necessary for it to be the first column, you will have to reformat it:
$ datamash -t' ' --sort groupby 1,2,3 unique 4 count 4 < input | awk '{$0=$NF FS $0; NF--}1'
3 A B C D,J,Q
1 E B C P
2 E F G N,P
1 G Y F Q
2 H F S L

Merging two outputs in shell script

I have output of 2 commands like:
op of first cmd:
A B
C D
E F
G H
op of second cmd:
I J
K L
M B
i want to merge both the outputs , and if a value in second column is same for both outputs, I'll take entry set from 1st output..
So , my output should be
A B
C D
E F
G H
I J
K L
//not taking (M B) sice B is already there in first entry(A B) , so giving preference to first output
can i do this using shell script , is there any command?
You can use awk:
awk 'FNR==NR{a[$2];print;next} !($2 in a)' file1 file2
A B
C D
E F
G H
I J
K L
If the order of entries is not important, you can sort on the 2nd column and uniquefy:
sort -u -k2 file1 file2
Both -u and -k are specified in the POSIX standard
This wouldn't work if there are repeated entries in the 2nd column of file1.

how can I send parameter to awk using shell script

I have this file
myfile
a b c d e 1
b c s d e 1
a b d e f 2
d f g h j 2
awk 'if $6==$variable {print #0}' myfile
How can I use this code in shell script that get $variable as parameter by user in command prompt?
You can use awk's -v flag. And since awk prints by default, you can try for example:
variable=1
awk -v var=$variable '$6 == var' file.txt
Results:
a b c d e 1
b c s d e 1
EDIT:
The command is essentially the same, wrapped up in shell. You can use it in a shell script with multiple arguments like this script.sh 2 j
Contents of script.sh:
command=$(awk -v var_one=$1 -v var_two=$2 '$6 == var_one && $5 == var_two' file.txt)
echo -e "$command"
Results:
d f g h j 2
This is question 24 in the comp.unix.shell FAQ (http://cfajohnson.com/shell/cus-faq-2.html#Q24) but the most commonly used alternatives with the most common reasons to pick between the 2 are:
-v var=value '<script>' file1 file2
if you want the variable to be populated in the BEGIN section
or:
'<script>' file1 var=value file2
if you do not want the variable to be populated in the BEGIN section and/or need to change the variables value between files

Resources