Merging two outputs in shell script - shell

I have output of 2 commands like:
op of first cmd:
A B
C D
E F
G H
op of second cmd:
I J
K L
M B
i want to merge both the outputs , and if a value in second column is same for both outputs, I'll take entry set from 1st output..
So , my output should be
A B
C D
E F
G H
I J
K L
//not taking (M B) sice B is already there in first entry(A B) , so giving preference to first output
can i do this using shell script , is there any command?

You can use awk:
awk 'FNR==NR{a[$2];print;next} !($2 in a)' file1 file2
A B
C D
E F
G H
I J
K L

If the order of entries is not important, you can sort on the 2nd column and uniquefy:
sort -u -k2 file1 file2
Both -u and -k are specified in the POSIX standard
This wouldn't work if there are repeated entries in the 2nd column of file1.

Related

Add a specific string at the end of each line

I have a mainfile with 4 columns, such as:
a b c d
e f g h
i j k l
in another file, i have one line of text corresponding to the respective line in the mainfile, which i want to add as a new column to the mainfile, such as:
a b c d x
e f g h y
i j k l z
Is this possible in bash? I can only add the same string to the end of each line.
Two ways you can do
1) paste file1 file2
2) Iterate over both files and combine line by line and write to new file
You could use GNU parallel for that:
fe-laptop-m:test fe$ cat first
a b c d
e f g h
i j k l
fe-laptop-m:test fe$ cat second
x
y
z
fe-laptop-m:test fe$ parallel echo ::::+ first second
a b c d x
e f g h y
i j k l z
Do I get you right what you try to achieve?
This might work for you (GNU sed):
sed -E 's#(^.*) .*#/^\1/s/$/ &/#' file2 | sed -f - file1
Create a sed script from file2 that uses a regexp to match a line in file1 and if it does appends the contents of that line in file2 to the matched line.
N.B.This is independent of the order and length of file1.
You can try using pr
pr -mts' ' file1 file2

Matching contents of one file with another and returning second column

So I have two txt files
file1.txt
s
j
z
z
e
and file2.txt
s h
f a
j e
k m
z l
d p
e o
and what I want to do is match the first letter of file1 with the first letter of file 2 and return the second column of file 2. so for example excepted output would be
h
e
l
l
o
I'm trying to use join file1.txt file2.txt but that just prints out the entire second file. not sure how to fix this. Thank you.
This is an awk classic:
$ awk 'NR==FNR{a[$1]=$2;next}{print a[$1]}' file2 file1
h
e
l
l
o
Explained:
$ awk '
NR==FNR { # processing file2
a[$1]=$2 # hash records, first field as key, second is the value
next
} { # second file
print a[$1] # output, change the record with related, stored one
}' file2 file1

converting four columns to two using linux commands

I am wondering how one could merge four columns into two in the following manner (using the awk command, or other possible commands).
For example,
Old:
A B C D
E F G H
I J K L
M N O P
.
.
.
New:
A B
C D
E F
G H
I J
K L
M N
O P
.
.
Thanks so much!
That's actually quite easy with awk, as per the following transcript:
pax> cat inputFile
A B C D
E F G H
pax> awk '{printf "%s %s\n%s %s\n", $1, $2, $3, $4}' <inputFile
A B
C D
E F
G H
Hww about using xargs here? Could you please try following once.
xargs -n 2 < Input_file
Output will be as follows.
A B
C D
E F
G H
I J
K L
M N
O P
with GNU sed
$ sed 's/ /\n/2' file
replace 2nd space with new line.

Linux Bash count and summarize by unique columns

I have a text file with lines like this (in Linux Bash):
A B C D
A B C J
E B C P
E F G N
E F G P
A B C Q
H F S L
G Y F Q
H F S L
I need to find the lines with unique values for the first 3 columns, print their count and then print summarized last column for each unique line, so the result is like this:
3 A B C D,J,Q
1 E B C P
2 E F G N,P
1 G Y F Q
2 H F S L
What I have tried:
cat FILE | sort -k1,3 | uniq -f3 -c | sort -k3,5nr
Is there maybe any advice?
Thanks in advance!
The easiest is to do the following:
awk '{key=$1 OFS $2 OFS $3; a[key]=a[key]","$4; c[key]++}
END{for(key in a) { print c[key],key,substr(a[key],2) }}' <file>
If you do not want any duplication, you can do
awk '{ key=$1 OFS $2 OFS $3; c[key]++ }
!gsub(","$4,","$4,a[key]) {a[key]=a[key]","$4; }
END{for(key in a) { print c[key],key,substr(a[key],2) }} <file>
Could you please try following and let me know if this helps you.
This will give you output in same sequence as per Input_file's $1, $2, and $3 occurrence only.
awk '
!a[$1,$2,$3]++{
b[++count]=$1 FS $2 FS $3
}
{
c[$1,$2,$3]=c[$1,$2,$3]?c[$1,$2,$3] "," $4:$0
d[$1 FS $2 FS $3]++
}
END{
for(i=1;i<=count;i++){
print d[b[i]],c[b[i]]
}
}
' SUBSEP=" " Input_file
Another using GNU awk and 2d arrays for removing duplicates in $4:
$ awk '{
i=$1 OFS $2 OFS $3 # key to hash
a[i][$4] # store each $4 to separate element
c[i]++ # count key references
}
END {
for(i in a) {
k=1 # comma counter for output
printf "%s %s ",c[i],i # output count and key
for(j in a[i]) # each a[]i[j] element
printf "%s%s",((k++)==1?"":","),j # output commas and elements
print "" # line-ending
}
}' file
Output in default random order:
2 E F G N,P
3 A B C Q,D,J
1 G Y F Q
1 E B C P
2 H F S L
Since we are using GNU awk the order of output could be affected easily by setting PROCINFO["sorted_in"]="#ind_str_asc":
3 A B C D,J,Q
1 E B C P
2 E F G N,P
1 G Y F Q
2 H F S L
You could utilize GNU datamash:
$ cat input
A B C D
A B C J
E B C P
E F G N
E F G P
A B C Q
H F S L
G Y F Q
H F S L
$ datamash -t' ' --sort groupby 1,2,3 unique 4 count 4 < input
A B C D,J,Q 3
E B C P 1
E F G N,P 2
G Y F Q 1
H F S L 2
This unfortunately outputs the count as the last column. If it is absolutely necessary for it to be the first column, you will have to reformat it:
$ datamash -t' ' --sort groupby 1,2,3 unique 4 count 4 < input | awk '{$0=$NF FS $0; NF--}1'
3 A B C D,J,Q
1 E B C P
2 E F G N,P
1 G Y F Q
2 H F S L

Search for a column by name in awk

I have a file that has many columns. Let us say "Employee_number" "Employee_name" "Salary". I want to display all entries in a column by giving all or part of the column name. For example if my input "name" I want all the employee names printed. Is it possible to do this in a simple manner using awk?
Thanks
Given a script getcol.awk as follows:
BEGIN {
colname = ARGV[1]
ARGV[1] = ""
getline
for (i = 1; i <= NF; i++) {
if ($i ~ colname) {
break;
}
}
if (i > NF) exit
}
{print $i}
... and the input file test.txt:
apple banana candy deer elephant
A B C D E
A B C D E
A B C D E
A B C D E
A B C D E
A B C D E
A B C D E
... the command:
$ awk -f getcol.awk b <test.txt
... gives the following output:
B
B
B
B
B
B
B
Note that the output text does not include the first line of the test file, which is treated as a header.
Simple one-liner will do the trick:
$ cat file
a b c
1 2 3
1 2 3
1 2 3
$ awk -v c="a" 'NR==1{for(i=1;i<=NF;i++)n=$i~c?i:n;next}n{print $n}' file
1
1
1
$ awk -v c="b" 'NR==1{for(i=1;i<=NF;i++)n=$i~c?i:n;next}n{print $n}' file
2
2
2
$ awk -v c="c" 'NR==1{for(i=1;i<=NF;i++)n=$i~c?i:n;next}n{print $n}' file
3
3
3
# no column d so no output
$ awk -v c="d" 'NR==1{for(i=1;i<=NF;i++)n=$i~c?i:n;next}n{print $n}' file
Note: as in your requirement you want name to match employee_name just be aware if you give employee you will get the last column matching employee this is easily changed however.

Resources