Join multiple tables by row names [duplicate] - bash

This question already has answers here:
Merging very large csv files with common column
(6 answers)
Closed 8 years ago.
I would like to merge multiple tables by row names. The tables differ in the amount of rows and they have unique and shared rows, which should all appear in output. If possible I would like to solve the problem with awk, but I am also fine with other solutions.
table1.tab
a 5
b 5
d 9
table2.tab
a 1
b 2
c 8
e 11
The output I would like to obtain the following table:
table3.tab
a 5 1
b 5 2
d 9 0
c 0 8
e 0 11
I tried using join
join table1.tab table2.tab > table3.tab
but I get
table3.tab
a 5 1
b 5 2
row c, d and e are not in the output.

You want to do a full outer join:
join -a1 -a2 -o 0 1.2 2.2 -e "0" table1.tab table2.tab
a 5 1
b 5 2
c 0 8
d 9 0
e 0 11

this awk oneliner should work for your example:
awk 'NR==FNR{a[$1]=$2;k[$1];next}{b[$1]=$2;k[$1]}
END{for(x in k)printf"%s %d %d\n",x,a[x],b[x]}' table1 table2
test
kent$ head f1 f2
==> f1 <==
a 5
b 5
d 9
==> f2 <==
a 1
b 2
c 8
e 11
kent$ awk 'NR==FNR{a[$1]=$2;k[$1];next}{b[$1]=$2;k[$1]}END{for(x in k)printf"%s %d %d\n",x,a[x],b[x]}' f1 f2
a 5 1
b 5 2
c 0 8
d 9 0
e 0 11

Related

How to align rows from two different files by similitude? [duplicate]

This question already has answers here:
Inner join on two text files
(5 answers)
Closed 3 years ago.
I need help to align two files by similitude of the values from the column 2 (file 1) and column 1 (file 2).
file 1:
1 d 3
2 e 4
5 o 1
file 2:
e 6
o 5
d 8
I want to get
1 d 3 d 8
2 e 4 e 6
5 o 1 o 5
Try using the join command:
join -o "1.1,1.2,1.3,2.1,2.2" -1 2 <(cat file1 | sort) <(cat file2 | sort)
output:
1 d 3 d 8
2 e 4 e 6
5 o 1 o 5
Your files will need to be sorted for this to work. They weren't, so I had to sort them for you.
If both files have exactly the same keys (and number of lines), you can use paste:
paste -d\ <(sort -k2 file1) <(sort file2)

shell command: join crashes for large files?

I have two files; 1.txt and 2.txt
1.txt has the following content:
a 1 2 3 4 5
b 4 5 6 7 7
c 4 5 6 7 6
d 6 5 4 3 2
and 2.txt;
b
d
I need to extract those lines from 1.txt whose first fields match the first fields of 2.txt;
b 4 5 6 7 7
d 6 5 4 3 2
I thought a simple join command should work for me:
join 1.txt 2.txt
But unfortunately, the command produces just a couple of lines, even though both files are pretty large.
I cannot figure out what's going on.

Find and replace entries in one csv file using another with bash

Main file:
A B
C D
D A
G H
Ref file:
1 A
2 B
3 C
4 D
5 G
6 H
New file:
1 2
3 4
4 1
5 6
I wanna do the above replacement, how can I do that using awk or some simple command line?
awk solution:
awk 'NR==FNR{ a[$2]=$1; next }{ $1=a[$1]; $2=a[$2] }1' reffile mainfile
The output:
1 2
3 4
4 1
5 6
a[$2]=$1 - capturing numbers from reffile into array indexed by letters (e.g. a["A"]=1)
$1=a[$1]; $2=a[$2] - replacing letters in mainfile with respective numbers

How to combine column from multiple text files? [duplicate]

This question already has answers here:
How can I sum values in column based on the value in another column?
(5 answers)
Combine text from two files, output to another [duplicate]
(2 answers)
Closed 6 years ago.
I want to extract and combine a certain column from a bunch of text files into a single file as shown.
File1_example.txt
A 123 1
B 234 2
C 345 3
D 456 4
File2_example.txt
A 123 5
B 234 6
C 345 7
D 456 8
File3_example.txt
A 123 9
B 234 10
C 345 11
D 456 12
...
..
.
File100_example.txt
A 123 55
B 234 66
C 345 77
D 456 88
How can I loop through my files of interest and paste these columns together so that the final result is like below without having to type out 1000 unique file names?
1 5 9 ... 55
2 6 10 ... 66
3 7 11 ... 77
4 8 12 ... 88
Try this:
paste File[0-9]*_example.txt | awk '{i=3;while($i){printf("%s ",$i);i+=3}printf("\n")}'
Example:
File1_example.txt:
A 123 1
B 234 2
C 345 3
D 456 4
File2_example.txt:
A 123 5
B 234 6
C 345 7
D 456 8
Run command as:
$ paste File[0-9]*_example.txt | awk '{i=3;while($i){printf("%s ",$i);i+=3}printf("\n")}'
Output:
1 5
2 6
3 7
4 8
I tested below code with first 3 files
cat File*_example.txt | awk '{a[$1$2]= a[$1$2] $3 " "} END{for(x in a){print a[x]}}' | sort
1 5 9
2 6 10
3 7 11
4 8 12
1) use an awk array, a[$1$2]= a[$1$2] $3 " " index is column1 and column2, array value appends all column 3.
2) END{for(x in a){print a[x]}} travesrsed array a and prints all values.
3)use sort to sort the output.
when cating you need to ensure the file order is preserved, one way is to explicitly specify the files
cat File{1..100}_example.txt | awk '{print $NF}' | pr 4ts' '
extract last column by awk and align using pr

bash print complete lines where just the first n characters match

I have created a sorted list of hashes for certain files
ffb01af8fda1e5c3b74d1eb384d021be1f1577c3 *./Pictures/camera/London 170713/P9110042.JPG
ffb01af8fda1e5c3b74d1eb384d021be1f1577c3 *./Pictures/london/P9110042.JPG
where there are duplicate hashes (just the hashes), I want to print the whole line of all matches
so say there where hashes A B C
A 1
B 2
B 3
C 4
C 5
C 6
in this example all the lines except the first one should be printed
B 2
B 3
C 4
C 5
C 6
Before you continue, look up fdupes.
If you don't want to use a robust tool specifically intended to find duplicate files, you can use sort | uniq:
$ cat file
A 1
B 2
B 3
C 4
C 5
C 6
$ sort file | uniq -w 1 -D
B 2
B 3
C 4
C 5
C 6
Using awk you can do (will work with unsorted file also):
awk 'FNR==NR{seen[$1]++; next} seen[$1]>1' file file
B 2
B 3
C 4
C 5
C 6

Resources