Merging two files with columns in bash [closed]

Merging two files with columns in bash [closed] - bash

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have a file1.txt and the output is:
test4 30
test6 29
test3 17
test2 12
test5 5
This is file is ordered by second column. I sorted it with sort -nr -k 2 .
I have also file2.txt with the content of:
test2 A
test3 B
test4 C
test5 D
test6 E
What I want as result(result.txt) is:
test4 C 30
test6 E 29
test3 B 17
test2 A 12
test5 D 5

Using awk:
awk 'FNR == NR { a[$1] = $2; next } { print $1, a[$1], $2 }' file2 file1
Output:
test4 C 30
test6 E 29
test3 B 17
test2 A 12
test5 D 5
If file1 is not yet sorted, you can do:
sort -nr -k 2 file1 | awk 'FNR == NR { a[$1] = $2; next } { print $1, a[$1], $2 }' file2 -
Or
awk 'FNR == NR { a[$1] = $2; next } { print $1, a[$1], $2 }' file2 <(sort -nr -k 2 file1)
There are many ways to format the output. You can use column -t:
... | column -t
Output:
test4 C 30
test6 E 29
test3 B 17
test2 A 12
test5 D 5
Or you can use printf. Although I'd prefer using column -t since table would be broken if one column grows larger than the actual size that printf has provided.
... { printf "%s%3s%4.2s\n", $1, a[$1], $2 }' ...
Output:
test4 C 30
test6 E 29
test3 B 17
test2 A 12
test5 D 5

Don't sort the file before processing it, keep the sorting by 1st column.
Assuming you have :
file1 file 2
________________________
test2 12 test2 A
test3 17 test3 B
test4 30 test4 C
test5 5 test5 D
test6 29 test6 E
Using join file2 file1 | sort -nr -k 3 will yield :
test4 C 30
test6 E 29
test3 B 17
test2 A 12
test5 D 5
use -t' ' if you want your spacing unmodified by join

Related

How to add Column values based on unique value of a different column

I am trying to add values in Column B based on unique value in Column A. How can i do it using AWK (or) Any other way using bash?
Column_A | Column_B
--------------------
A | 1
A | 2
A | 1
B | 3
B | 8
C | 5
C | 8
Result:
Column_A | Column_B
--------------------
A | 6
B | 11
C | 13

Considering that your Input_file is same as shown, sorted with first field, could you please try following(will edit solution for alignment too soon).
awk '
BEGIN{
OFS=" | "
}
FNR==1 || /^-/{
print
next
}
prev!=$1 && prev{
print prev,sum
prev=sum=""
}
{
sum+=$NF
prev=$1
}
END{
if(prev && sum){
print prev,sum
}
}' Input_file

another awk
$ awk 'NR<3 {print; next}
{a[$1]+=$NF; line[$1]=$0}
END {for(k in a) {sub(/[0-9]+$/,a[k],line[k]); print line[k]}}' file
Column_A | Column_B
--------------------
A | 4
B | 11
C | 13
note that A totals to 4, not 6.

One possible solution (Assuming file is in CSV format):
Input :
$ cat csvtest.csv
A,1
A,2
A,3
B,3
B,8
C,5
C,8
$ cat csvtest.csv | awk -F "," '{arr[$1]+=$2} END {for (i in arr) {print i","arr[i]}}'
A,6
B,11
C,13

Remove all words in file A from a list in a file B

I have two files: A and B.
Contents of A:
p218 first_departure_date p219 2017-01-03 p220 sg40 Joe p221 expire_date 222 11-09-2024 p223 dob 224 00-00-0000 p225 gender 226 MR p227 last_departure_date 228 2017-01-03
Contents from file B:
p219
p218
p220
p221
p227
p223
p225
p228
Expected results:
first_departure_date 2017-01-03 sg40 Joe expire_date 11-09-2024 dob 00-00-0000 gender MR last_departure_date 2017-01-03
Now, I would like to remove all the occurences of the lines in file A from file B.
I have tried following:
grep -vxFf fileB fileA > fileC
But it didn't do anything at all.

$ awk '
NR==FNR { b[$1]; next }
{
c = 0
for (i=1; i<=NF; i++) {
if ( !($i in b) ) {
printf "%s%s", (c++ ? OFS : ""), $i
}
}
print ""
}
' fileB fileA
first_departure_date 2017-01-03 sg40 Joe expire_date 222 11-09-2024 dob 224 00-00-0000 gender 226 MR last_departure_date 228 2017-01-03

This might work for you (GNU sed):
sed 's/[^[:alnum:]]/\\&/g;s/.*/s#&\\s*##g/' fileB | sed -f - fileA
This uses the fileB to create a sed script which removes any words in fileB followed by possible white space from fileA.

awk '{print "s/"$0"//"}' fileB > out
sed -f out fileA -e 's/^[ ]//'
Sample output:
first_departure_date 2017-01-03 sg40 Joe expire_date 222 11-09-2024 dob 224 00-00-0000 gender 226 MR last_departure_date 228 2017-01-03

Shell/awk script to read a column of files and combining columns to make a TSV file

I have over 600 files and I need to extract single column from each of the files and write them in a output file. My current code does this work and it takes column from all files and write the columns one after another in output file. However, I need two thing in my output file:
In the output file, instead of adding columns one after another, I need each column from the input files will be added as a new column in the output file (preferably as a TSV file).
The column name will be replaced by the file name.
My example code:
for f in *; do cat "$f" | tr "\t" "~" | cut -d"~" -f2; done >out.txt
Example input:
file01.txt
col1 col2 col3
1 2 3
4 5 6
7 8 9
10 11 12
file02.txt
col4 col5 col6
11 12 13
14 15 16
17 18 19
110 111 112
My current output:
col2
2
5
8
11
col5
12
15
18
111
Expected output:
file01.txt file02.txt
2 12
5 15
8 18
11 111

You can use awk like this:
awk -v OFS='\t' 'BEGIN {
for (i=1; i<ARGC; i++)
printf ARGV[i] OFS;
print ARGV[i];
}
FNR==1 { next }
{
a[FNR]=(a[FNR]==""?"":a[FNR] OFS) $2
}
END {
for(i=2; i<=FNR; i++)
print a[i];
}' file*.txt
file01.txt file02.txt
2 12
5 15
8 18
11 111

Sorting lines in one file given the order in another file

Given a file1:
13 a b c d
5 f a c d
7 d c g a
14 a v s d
and a file2:
7 x
5 c
14 a
13 i
I would like to sort file1 considering the same order of the first column in file2, so that the output should be:
7 d c g a
5 f a c d
14 a v s d
13 a b c d
Is it possible to do this in bash or should I use some "higher" language like python?

Use awk to put the line number from file2 as an extra column in front of file1. Sort the result by that column. Then remove that prefix column
awk 'FNR == NR { lineno[$1] = NR; next}
{print lineno[$1], $0;}' file2 file1 | sort -k 1,1n | cut -d' ' -f2-

Simple solution
for S in $(cat file2 | awk '{print $1}'); do grep $S file1; done

Compare two file columns (unsorted files)

Input File 1
A1 123 AA
B1 123 BB
C2 44 CC1
D1 12 DD1
E1 11 EE1
Input File 2
A sad21 1
DD1 124f2 2
CC 123tges 3
BB 124sdf 4
AA 1asrf 5
Output File
A1 123 AA 1asrf 5
B1 123 BB 124sdf 4
D1 12 DD1 124f2 2
Making of Output file
We check 3rd column of Input File 1 and 1st Col of Input File 2.
If they match , we print it in Output file.
Note :
The files are not sorted
I tried :
join -t, A B | awk -F "\t" 'BEGIN{OFS="\t"} {if ($3==$4) print $1,$2,$3,$4,$6}'
But this doesnot work as files are unsorted. so the condition ($3==$4) won't work all the time. Please help .

nawk 'FNR==NR{a[$3]=$0;next}{if($1 in a){p=$1;$1="";print a[p],$0}}' file1 file2
tested below:
> cat file1
A1 123 AA
B1 123 BB
C2 44 CC1
D1 12 DD1
E1 11 EE1
> cat file2
A sad21 1
DD1 124f2 2
CC 123tges 3
BB 124sdf 4
AA 1asrf 5
> awk 'FNR==NR{a[$3]=$0;next}{if($1 in a){p=$1;$1="";print a[p],$0}}' file1 file2
D1 12 DD1 124f2 2
B1 123 BB 124sdf 4
A1 123 AA 1asrf 5
>

You can use join, but you need to sort on the key field first and tell join that the key in the first file is column 3 (-1 3):
join -1 3 <(sort -k 3,3 file1) <(sort file2)
Will get you the correct fields, output (with column -t for output formatting):
AA A1 123 1asrf 5
BB B1 123 124sdf 4
DD1 D1 12 124f2 2
To get the same column ordering listed in the question, you need to specify the output format:
join -1 3 -o 1.1,1.2,1.3,2.2,2.3 <(sort -k 3,3 file1) <(sort file2)
i.e. file 1 fields 1 through 3 then file 2 fields 2 and 3. Output (again with column -t):
A1 123 AA 1asrf 5
B1 123 BB 124sdf 4
D1 12 DD1 124f2 2

perl -F'/\t/' -anle 'BEGIN{$f=1}if($f==1){$H{$F[2]}=$_;$f++ if eof}else{$l=$H{$F[0]};print join("\t",$l,#F[1..$#F]) if defined$l}' f1.txt f2.txt
or shorter
perl -F'/\t/' -anle'$f?($l=$H{$F[0]})&&print(join"\t",$l,#F[1..$#F]):($H{$F[2]}=$_);eof&&$f++' f1.txt f2.txt

One way using awk:
awk 'BEGIN { FS=OFS="\t" } FNR==NR { array[$1]=$2 OFS $3; next } { if ($3 in array) print $0, array[$3] }' file2.txt file1.txt
Results:
A1 123 AA 1asrf 5
B1 123 BB 124sdf 4
D1 12 DD1 124f2 2

This might work for you (GNU sed):
sed 's|\(\S*\)\(.*\)|/\\s\1$/s/$/\2/p|' file2 | sed -nf - file1

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Merging two files with columns in bash [closed] - bash

Related

How to add Column values based on unique value of a different column

Remove all words in file A from a list in a file B

Shell/awk script to read a column of files and combining columns to make a TSV file

Sorting lines in one file given the order in another file

Compare two file columns (unsorted files)

Categories

Resources