How to align rows from two different files by similitude? [duplicate] - bash

This question already has answers here:
Inner join on two text files
(5 answers)
Closed 3 years ago.
I need help to align two files by similitude of the values from the column 2 (file 1) and column 1 (file 2).
file 1:
1 d 3
2 e 4
5 o 1
file 2:
e 6
o 5
d 8
I want to get
1 d 3 d 8
2 e 4 e 6
5 o 1 o 5

Try using the join command:
join -o "1.1,1.2,1.3,2.1,2.2" -1 2 <(cat file1 | sort) <(cat file2 | sort)
output:
1 d 3 d 8
2 e 4 e 6
5 o 1 o 5
Your files will need to be sorted for this to work. They weren't, so I had to sort them for you.

If both files have exactly the same keys (and number of lines), you can use paste:
paste -d\ <(sort -k2 file1) <(sort file2)

Related

How to sort the file based on last column in unix using sort command?

a 1
b 2 4
c 3
d 4 5 7
e 4 6
f 5
how can we print the output like below using sort in which the last column is sorted -
a 1
c 3
b 2 4
f 5
e 4 6
d 4 5 7
We can achieve the result using awk -
$awk '{print $NF,$0}' file.txt | sort -n | cut -f2- -d' '
a 1
c 3
b 2 4
f 5
e 4 6
d 4 5 7
Could you please try following and let me know if this helps you.
rev Input_file | sort -nk1.1 | rev

Bash_shell Use shell to convert three format in one script to another script at one time

cat file1.txt
set A B 1
set C D E 2
set E F 3 3 3 3 3 3
cat file2.txt
A;B;1;
C;D.E;2;
E;F;3 3 3 3 3 3;
please help convert the format in file1.txt to file2.txt, the file2.txt is the output. I just input 3 lines in file1.txt for taking example, but in fact ,there are many command lines same with these 3 format.So the shell command should be adapt to any situation where the content contains these 3 format in file1.txt.
echo "set A B 1
set C D E 2
set E F 3 3 3 3 3 3 " | sed -r 's/set (.) /\1;/;s/([A-Z])*( ([A-Z]))/\1.\3/g;s/([A-Z]) ([0-9])/\1;\2/;s/ ?$/;/'
A;B;1;
C;D.E;2;
E;F;3 3 3 3 3 3;

shell command: join crashes for large files?

I have two files; 1.txt and 2.txt
1.txt has the following content:
a 1 2 3 4 5
b 4 5 6 7 7
c 4 5 6 7 6
d 6 5 4 3 2
and 2.txt;
b
d
I need to extract those lines from 1.txt whose first fields match the first fields of 2.txt;
b 4 5 6 7 7
d 6 5 4 3 2
I thought a simple join command should work for me:
join 1.txt 2.txt
But unfortunately, the command produces just a couple of lines, even though both files are pretty large.
I cannot figure out what's going on.

bash print complete lines where just the first n characters match

I have created a sorted list of hashes for certain files
ffb01af8fda1e5c3b74d1eb384d021be1f1577c3 *./Pictures/camera/London 170713/P9110042.JPG
ffb01af8fda1e5c3b74d1eb384d021be1f1577c3 *./Pictures/london/P9110042.JPG
where there are duplicate hashes (just the hashes), I want to print the whole line of all matches
so say there where hashes A B C
A 1
B 2
B 3
C 4
C 5
C 6
in this example all the lines except the first one should be printed
B 2
B 3
C 4
C 5
C 6
Before you continue, look up fdupes.
If you don't want to use a robust tool specifically intended to find duplicate files, you can use sort | uniq:
$ cat file
A 1
B 2
B 3
C 4
C 5
C 6
$ sort file | uniq -w 1 -D
B 2
B 3
C 4
C 5
C 6
Using awk you can do (will work with unsorted file also):
awk 'FNR==NR{seen[$1]++; next} seen[$1]>1' file file
B 2
B 3
C 4
C 5
C 6

Join multiple tables by row names [duplicate]

This question already has answers here:
Merging very large csv files with common column
(6 answers)
Closed 8 years ago.
I would like to merge multiple tables by row names. The tables differ in the amount of rows and they have unique and shared rows, which should all appear in output. If possible I would like to solve the problem with awk, but I am also fine with other solutions.
table1.tab
a 5
b 5
d 9
table2.tab
a 1
b 2
c 8
e 11
The output I would like to obtain the following table:
table3.tab
a 5 1
b 5 2
d 9 0
c 0 8
e 0 11
I tried using join
join table1.tab table2.tab > table3.tab
but I get
table3.tab
a 5 1
b 5 2
row c, d and e are not in the output.
You want to do a full outer join:
join -a1 -a2 -o 0 1.2 2.2 -e "0" table1.tab table2.tab
a 5 1
b 5 2
c 0 8
d 9 0
e 0 11
this awk oneliner should work for your example:
awk 'NR==FNR{a[$1]=$2;k[$1];next}{b[$1]=$2;k[$1]}
END{for(x in k)printf"%s %d %d\n",x,a[x],b[x]}' table1 table2
test
kent$ head f1 f2
==> f1 <==
a 5
b 5
d 9
==> f2 <==
a 1
b 2
c 8
e 11
kent$ awk 'NR==FNR{a[$1]=$2;k[$1];next}{b[$1]=$2;k[$1]}END{for(x in k)printf"%s %d %d\n",x,a[x],b[x]}' f1 f2
a 5 1
b 5 2
c 0 8
d 9 0
e 0 11

Resources