Merge two files in linux with different column - bash

I have two files in linux, the first file has 4 columns and the second has 2 columns. I want to merge these files into a new file that has the first 3 columns from file 1 and the first column from file 2. I tried awk, but my data from file 2 was placed under file 1.

paste file1 file2 | awk '{print $1,$2,$3,$5}'

Not sure which columns you want from each file, but something like this should work:
paste <file1> <file2> | awk '{print $1,$2,$3,$5}'
The first three columns would be picked from file1, and the fourth skipped, then pick the first column from the second file.

If the files have the same number of rows, you can do something like:
awk '{ getline v < "file2"; split( v, a ); print a[2], $1, $3 }' file1
to print colums 1 and 3 from file 1 and column 2 from file2.

you can try this one without paste command:
awk '{print $1}{print $2}{print $3}' file1 >> mergedfile
awk '{print $2}' file2 >> mergedfile

Related

Extracting unique values between 2 files with awk

I need to get uniq lines when comparing 2 files. These files containing field separator ":" which should be treated as the end of line while comparing strings.
The file1 contains these lines
apple:tasty
apple:red
orange:nice
kiwi:awesome
kiwi:expensive
banana:big
grape:green
orange:oval
banana:long
The file2 contains these lines
orange:nice
banana:long
The output file should be (2 occurrences of orange and 2 occurrences of banana deleted)
apple:tasty
apple:red
kiwi:awesome
kiwi:expensive
grape:green
So the only strings before : should be compared
Is it possible to complete this task in 1 command ?
I tried to complete the task in such way but field separator does not work in that situation.
awk -F: 'FNR==NR {a[$0]++; next} !a[$0]' file1 file2 > outputfile
You basically had it, but $0 refers to the whole line when you want to deal with only the first field, which is $1.
Also you need to take care with the order of the input files. To use the values from file2 for deciding which lines to include from file1, process file2 first:
$ awk -F: 'FNR==NR {a[$1]++; next} !a[$1]' file2 file1
apple:tasty
apple:red
kiwi:awesome
kiwi:expensive
grape:green
One comment: awk is very ineffective with arrays. In real life with big files, better use something like:
comm -3 <(cut -d : -f 1 f1 | sort -u) <(cut -d : -f 1 f2 | sort -u) | grep -h -f /dev/stdin f1 f2

Merging 2 sorted files(with similar content) based on Timestamp in shellscript

I have 2 identical files with below content:
File1:
1,Abhi,Ban,20180921T09:09:01,EmpId1,SalaryX
4,Bbhi,Dan,20180922T09:09:03,EmpId2,SalaryY
7,Cbhi,Ean,20180923T09:09:05,EmpId3,SalaryZ
9,Dbhi,Fan,20180924T09:09:09,EmpId4,SalaryQ
File2:
11,Ebhi,Gan,20180922T09:09:02,EmpId5,SalaryA
12,Fbhi,Han,20180923T09:09:04,EmpId6,SalaryB
3,Gbhi,Ian,20180924T09:09:06,EmpId7,SalaryC
5,Hbhi,Jan,20180925T09:09:08,EmpId8,SalaryD
I want to append all File1's content in Files (based on the date in ascending order)
Outcome:
1,Abhi,Ban,20180921T09:09:01,EmpId1,SalaryX
11,Ebhi,Gan,20180922T09:09:02,EmpId5,SalaryA
4,Bbhi,Dan,20180922T09:09:03,EmpId2,SalaryY
12,Fbhi,Han,20180923T09:09:04,EmpId6,SalaryB
7,Cbhi,Ean,20180923T09:09:05,EmpId3,SalaryZ
3,Gbhi,Ian,20180924T09:09:06,EmpId7,SalaryC
9,Dbhi,Fan,20180924T09:09:09,EmpId4,SalaryQ
5,Hbhi,Jan,20180925T09:09:08,EmpId8,SalaryD
You can use below AWK construct to do this :-
awk -F "," 'NR==FNR{print $4, $0;next} NR>FNR{print $4, $0;}' f1.txt f2.txt | sort | awk '{print $2}'
Explanation :-
Prefix date column ($4) before every line ($0) for both the files.
sort it. And Then print $2 which is whole line.
These printed lines will be in sorted order by date.
f1.txt and f2.txt are two file names.
You can try the following command
awk 'FNR==NR{a[FNR]=$0;next}{print a[FNR]"\n"$0}' file1 file2
with an array a store file1's datas, FNR is a's key.

Compare two csv files, use the first three columns as identifier, then print common lines

I have two csv files. File 1 has 9861 rows and 4 columns while File 2 has 6037 rows and 5 columns.Here are the files.
Link of File 1
Link of File 2
The first three columns are years, months, days respectively.
I want to get the lines in File 2 with the same identifier in File 1 and print this to File 3.
I found this command from some posts here but this only works using one column as identifier:
awk -F, 'NR==FNR {a[$1]=$0;next}; $1 in a {print a[$1]; print}' file1 file2
Is there a way to do this using awk or any simpler commands where I can use the first three columns as identifier?
Ill appreciate any help.
Just use more columns to make the uniqueness you need:
$ awk -F, 'NR==FNR {a[$1, $2, $3] = $0; next}
$1 SUBSEP $2 SUBSEP $3 in a' file1 file2
SUBSEP
is the subscript separator. It has the default value of "\034", and is used to separate the parts of the indices of a multi-dimensional array. Thus, the expression foo["A", "B"] really accesses foo["A\034B"]
awk -F, '{k=$1 FS $2 FS $3} NR==FNR{a[k];next} k in a' file1 file2
Untested of course since you didn't provide any sample input/output.

How to check whehter rows of a file within the rows of another file

I am fresh to Shell or Bash. I have file1 with one column and about 5000 rows and file2 have five columns with 240k rows. How can I check whether the values of the 5000 rows in file1 within or not the second column of file2?
$wc -l file1
$5188
$wc -l file2
$240,888
You can do this with awk, something like this:
awk 'NR == FNR {a[$2] = $1; next} {if ($2 in a){print(a[$2], $1)}}' file1 file2
Basically you read the first file in and store its contents in an array "a". Then you read the second file and check if the second field of each line is contained within array "a" and print it if it is.
My answer assumes your fields are separated by white space, if they are not you will have to change the separator. So, if your fields are separated by commas, you will need:
awk -F, .....
The above syntax does work, and it can be further simplified as:
awk 'FNR==NR{a[$1]=$2; next} {print $1, a[$1]}' file2 file1

how to insert a column between 2 columns in a file?

I'm on Mac. Need to insert a column from file 1 to file 2 that has 4 columns. The inserted column will be between column 1 and 2 in file 2.
can I use "paste", but how to tell it to insert into a specific position?
paste <(awk '{print $1}' file2) file1 <(awk '{print $2, $3, $4}' file2)
This creates three 'files', one with column 1 of file2, then file1, then columns 2-4 of file2, and uses paste to collect them together. The <(...) notation is Process Substitution.
You can like this:
echo "col1 col3 col4" | awk '{print $1,"col2",$2,$3}'
Depending on your delimiter between columns, you can easily modify it accordingly.

Resources