diff command and writing output to tab separated file - bash

I have two txt files
file 1:
a 1
b 2
d 4
and file 2:
a 1
d 4
I want the lines which are in file1 but not in file2 to be in a tab separated file3 i.e.
b 2
I use
diff file1 file2 | grep ">" > file3
file3 has the right lines but I want to get rid of the ">" symbol.
Can you suggest how I can do this?

You don't want diff here you want comm.
comm -2 -3 file1 file2

Here is an awk command that doesn't require input files to be sorted:
awk 'FNR==NR{a[$0]; next} !($0 in a)' file2 file1
b 2
Explanation:
FNR==NR # execute this block for first file in the list (file2)
a[$0] # populate an associative array with key as $0 (full line)
next # move to next record
!($0 in a) # for 2nd file in list (file1) print if a record doesn't exist in array a

Related

awk for string comparison with multiple delimiters

I have a file with multiple delimiters, I m looking to compare the value after the first / when read from right left with another file.
code :-
awk -F'[/|]' NR==FNR{a[$3]; next} ($1 in a )' file1 file2 > output
cat file1
AAB/BBC/customer|fed|12931|
/customer|fed|982311|
BXC/DEF/OTTA|fed|92374|
AVD/customer|FST|8736481|
FFS/T6TT/BOSTON|money|18922|
GTS/trust/YYYY|opt|62376|
XXY/IJSH/trust|opt|62376|
cat file2
customer
trust
expected output :-
AAB/BBC/customer|fed|12931|
/customer|fed|12931|
AVD/customer|FST|8736481|
XXY/IJSH/trust|opt|62376|
$ awk -F\| ' # just use one FS
NR==FNR {
a[$1]
next
}
{
n=split($1,t,/\//) # ... and use split to the 1st field
if(t[n] in a) # and compare the last split part
print
}' file2 file1
Output:
AAB/BBC/customer|fed|12931|
/customer|fed|982311|
AVD/customer|FST|8736481|
XXY/IJSH/trust|opt|62376|
If you use this [/|] you will have 2 delimiters and you will not know what the value after the last pipe was.
Reading your question, you want to compare the first value after the last slash without pipe chars.
If there has to be a / present in the string, you can set that as the field separator and check if there are at least 2 fields using NF > 1
Then take the last field using $NF, split on | and check if the first part is present in one of the values of file2 which are stored in array a
$cat file1
AAB/BBC/customer|fed|12931|
/customer|fed|982311|
BXC/DEF/OTTA|fed|92374|
AVD/customer|FST|8736481|
FFS/T6TT/BOSTON|money|18922|
GTS/trust/YYYY|opt|62376|
XXY/IJSH/trust|opt|62376|
customer
Example code
awk -F/ '
NR==FNR {a[$1];next}
NF > 1 {
split($NF, t, "|")
if(t[1] in a) print
}
' file2 file1
Output
AAB/BBC/customer|fed|12931|
/customer|fed|982311|
AVD/customer|FST|8736481|
XXY/IJSH/trust|opt|62376|

CSV join from command line

I have two csv file, and I would like to "merge" them and enrich the CSV1 with the data from CSV2. Both of them have the same B column.
CSV1:
A,B,C,D,E
1,2,3,,
1,2,3,,
1,2,3,,
CSV2:
B,D,E
2,4,5
2,4,5
2,4,5
I would like to have:
A,B,C,D,E
1,2,3,4,5
1,2,3,4,5
1,2,3,4,5
Which is the best way to do this? Consider the files have 2mln of rows.
Extract columns 1 to 3 from CSV1, and 2 and 3 from CSV2 using cut, combine them using paste with custom delimiter ,.
$ paste -d, <(cut -d, -f1-3 CSV1) <(cut -d, -f2,3 CSV2)
A,B,C,D,E
1,2,3,4,5
1,2,3,4,5
1,2,3,4,5
Here's one in awk looping file1 and using getline to read from file2:
$ awk 'BEGIN {
FS=OFS="," # separators
file="file2" # set file2 name
}
{
printf "%s,%s,%s",$1,$2,$3 # output from file1
print (getline < file > 0? OFS $2 OFS$3:"") # and from file2 if records left
}
END { # after processing file1...
while(getline < file) # continue with lines from...
print "","","",$2,$3 # file2 if any left
}' file1
Output if file2 > file1 (> meaning the number of records):
A,B,C,D,E
1,2,3,4,5
1,2,3,4,5
1,2,3,4,5
,,,4,5
and if file1>file2:
A,B,C,D,E
1,2,3,4,5
1,2,3,4,5
1,2,3

Merge File1 with File2 (keep appending from File1 to File2 until no more rows)

I can't find a solution.
So here is the problem.
Result should be 100 rows (File1) with contents from File2 repeating 25 times.
What I want is to join the contents even though the number of rows is not equal. Keep repeating including lines from File2 until number of rows from File1 is met.
File1:
test1#domain.com
test2#domain2.com
test3#domain3.com
test4#domain4.com
File2:
A1,B11
A2,B22
A3,B33
A4,B44
What I want is to combine the files in the following to have the following expected result:
File3:
test1#domain.com,A1,B12
test2#domain2.com,A2,B22
test3#domain3.com,A3,B33
test4#domain4.com,A4,B44
Note here: After it finishes with the 4 rows from File2, start again from first line, then repeat.
test5#domain5.com,A1,B12
test6#domain6.com,A2,B22
test7#domain7.com,A3,B33
test8#domain8.com,A4,B44
The example in your question isn't clear but I THINK this is what you're trying to do:
$ awk -v OFS=',' 'NR==FNR{a[++n]=$0;next} {print $0, a[(FNR-1)%n+1]}' file2 file1
test1#domain.com,A1,B11
test2#domain2.com,A2,B22
test3#domain3.com,A3,B33
test4#domain4.com,A4,B44
test5#domain5.com,A1,B11
test6#domain6.com,A2,B22
The above was run against this input:
$ cat file1
test1#domain.com
test2#domain2.com
test3#domain3.com
test4#domain4.com
test5#domain5.com
test6#domain6.com
$
$ cat file2
A1,B11
A2,B22
A3,B33
A4,B44
Could you please try following.
awk '
BEGIN{
OFS=","
}
FNR==NR{
a[++count]=$0
next
}
{
count_curr++
count_curr=count_curr>count?1:count_curr
print a[count_curr],$0
}
' Input_file2 Input_file1

How to add numbers from multiple files and write to another file using unix

I have five files as shown below, with the single line with comma separated value
File 1
abc,100
File 2
abc,200
File 3
abc,300
File 4
abc,700
File 5
abc,800
I need the output as by adding the numbers from all above files.
the output script should be in the single line code.
Output file
abc,2100
awk -F, '{code=$1; total += $2} END {printf("%s,%d\n", code, total)}' file1 file2 file3 file4 file5 > outputfile
Try:
awk -F\, '{a[$1]+=$2}END{for (i in a){print i","a[i]}}' file* > target
This will be usable for mutiple key input files.
For the new expected output:
awk -F\, '{a[$1]+=$2}END{for (i in a){key=key"_"i;cont+=a[i]};sub(/^_/,"",key);print key","cont}' file*
Results
abc_bbc,2100

compare the value from file in awk

I have two files named file1 and file2, now i want to pick value from file1 and search it on whole file2, if record found then apply the operation on the file1 records else apply some other operation on file1 records.
file 1
a
s
d
f
g
file 2
q
e
r
a
g
earlier i was using like below
awk -F'|' '{if($1="abc" || $1="a") print ......}' file1
Now i have multiple values to compare and i have put the values in the file (abc,a.....)
but i don't know how to use it.
Please help
A common way to do this would be to store the values you are searching for in a hash and use it as a lookup table. Something like this might work for you:
search.awk
FNR==NR {
seen[$0]
next
}
$0 in seen {
print $0 " is in file2"
next
}
{
print $0 " is not in file2"
}
Run it like this:
awk -f search.awk file2 file1
Output:
a is in file2
s is not in file2
d is not in file2
f is not in file2
g is in file2
Using awk
awk 'NR==FNR{a[$1];next}{print $0,($1 in a)?"Yes":"No"}' file2 file1
a Yes
s No
d No
f No
g Yes

Resources