Unix shell scripting - multiple file concatenate - shell

consider the scenario-
file1:
mike;john;552
mike;mike;555
john;mike;121
file2:
aks;raj;425
man;joe;895
mike;john;552
Assuming file1 and file2 contain the above two sets of data. I would like to put the data from these two files into another file , where the data is uniq ( meaning: file1 and file2 contain common data mike;john;552, but when conbining files i do not want to have duplicates.
I used the command:
cat file1 file2 | sort -u > file3
but this gave me only the common line ie, the duplicate into file3.
Also tried
cat file1 file2 | uniq > file3
Didnt yield required result.
Expected output:
file3:
mike;john;552
mike;mike;555
john;mike;121
aks;raj;425
man;joe;895
Note: the data in file3 can be in any order.
Please help on this.

Your first command works for me and gives the expected output:
$ cat -v file1
mike;john;552
mike;mike;555
john;mike;121
$ cat -v file2
aks;raj;425
man;joe;895
mike;john;552
$ cat file1 file2 | sort -u > file3
$ cat file3
aks;raj;425
john;mike;121
man;joe;895
mike;john;552
mike;mike;555
If this doesn't happen for you, use cat -vE to find out why. Here are two examples:
$ cat file1 file2 | sort -u
foo
foo
$ cat file1 file2 | sort -u | cat -vE
foo$
foo $
In this case, it looks like you get duplicates, but the lines are actually different because of trailing whitespace.
$ cat file1 file2 | sort -u
foo
foo
$ cat file1 file2 | sort -u | cat -vE
foo$
foo^M$
In this case, it also looks like you get duplicates, but one file has carriage returns because it was saved in DOS/Windows mode instead of Unix mode.

Related

Finiding common lines for two files using bash

I am trying to compare two files and output a file which consists of common names for both.
File1
1990.A.BHT.s_fil 4.70
1991.H.BHT.s_fil 2.34
1992.O.BHT.s_fil 3.67
1993.C.BHT.s_fil -1.50
1994.I.BHT.s_fil -3.29
1995.K.BHT.s_fil -4.01
File2
1990.A.BHT_ScS.dat 1537 -2.21
1993.C.BHT_ScS.dat 1494 1.13
1994.I.BHT_ScS.dat 1545 0.15
1995.K.BHT_ScS.dat 1624 1.15
I want to compare the first parts of the names ** (ex:1990.A.BHT ) ** on both files and output a file which has common names with the values on 2nd column in file1 to file3
ex: file3 (output)
1990.A.BHT.s_fil 4.70
1993.C.BHT.s_fil -1.50
1994.I.BHT.s_fil -3.29
1995.K.BHT.s_fil -4.01
I used following codes which uses grep command
while read line
do
grep $line file1 >> file3
done < file2
and
grep -wf file1 file2 > file3
I sort the files before using this script.
But I get an empty file3. Can someone help me with this please?
You need to remove everything starting from _SCS.dat from the lines in file2. Then you can use that as a pattern to match lines in file1.
grep -F -f <(sed 's/_SCS\.dat.*//' file2) file1 > file3
The -F option matches fixed strings rather than treating them as regular expressions.
In your example data, the lines appear to be in sorted order. If you can guarantee that they always are, comm -1 -2 file1 file2 would do the job. If they can be unsorted, do a
comm -1 -2 <(sort file1) <(sort file2)

Shell script for merging dotenv files with duplicate keys

Given two dotenv files,
# file1
FOO="X"
BAR="B"
and
# file2
FOO="A"
BAZ="C"
I want to run
$ ./merge.sh file1.env file2.env > file3.env
to get the following output:
# file3
FOO="A"
BAR="B"
BAZ="C"
So far, I used the python-dotenv module to parse the files into dictionaries, merge them and write them back. However, I feel like there should be a simple solution in shell that rids myself of a third-party module for such a basic task.
Answer
Alright, so I ended up using
$ sort -u -t '=' -k 1,1 file1 file2 | grep -v '^$\|^\s*\#' > file3
which omits blank lines and comments. Nevertheless, the proposed awk solution works just as fine.
Another quite simple approach is to use sort:
sort -u -t '=' -k 1,1 file1 file2 > file3
results in a file where the keys from file1 take precedence over the keys from file2.
Using a simple awk script:
awk -F= '{a[$1]=$2}END{for(i in a) print i "=" a[i]}' file1 file2
This stores all key values in the array a and prints the array content when both files are parsed.
The keys that are in file2 override the ones in file1.
To add new values only from file2 and NOT overwrite initial values from file1. Omit spaces from file 2.
grep "\S" file2 >> file1
awk -F "=" '!a[$1]++' file1 > file3

Why does awk behave differently in terminal vs in a perl script?

I want to get the first occurance of a file in a directory matching some pattern. This is the command I'm using.
ls my/file/path/pattern* | awk '{print $1}'
Should my directory contain files pattern1, pattern2, pattern3, etc... This command will only return pattern1. This works as expected in the terminal window.
It fails in a perl script. My commands are
push #arr, `ls path/to/my/dir/pattern* | awk \'{print \$1}\'`;
print #arr;
The output here is
pattern1
pattern2
pattern3
I expect the output to only be pattern1. Why is the entirety of the ls's output being dumped to the array?
Edit: I have found a workaround doing
my $temp = (`ls path/to/my/dir/pattern* | awk \'{print \$1}\'`)[0];
But I still am curious why the other way won't work.
Edit: There are a couple of commenters saying my terminal command doesn't work as I describe, so here's a screenshot. I know awk returns the first column of each line, but it should work as I described if only a few files match.
Solved it! When I execute an ls in the term window, the results look like this
file1 file2 file3
file4 file5 file6
but the perl command
my $test = `ls`;
stores the results like this
file1
file2
file3
file4
file5
file6
Since awk '{print $1}' returns matching results in column 1 for each row, running the command
ls file* | awk '{print $1}'
in perl returns all matches.

bash delete lines in file containing lines from another file

file1 contains:
someword0
someword2
someword4
someword6
someword8
someword9
someword1
file2 contains:
someword2
someword3
someword4
someword5
someword7
someword11
someword1
So I wan't to have only lines from file1 which file2 doesn't contains. How can I do this in bash ?
That's the answer:
grep -v -x -f file2 file1
-v for select non-matching lines
-x for matching whole lines only
-f f2 to get patterns from f2.
You can use grep -vf:
grep -vwFf file2 file1
someword0
someword6
someword8
someword9
Check man grep for detailed info on all the grep options used here.
You could use the comm command as well:
comm -23 file1 file2
Explanation:
comm compares two files and prints, in 3 columns, lines unique to file1, file2, and lines in both.
Using the options -2 and -3 (or simply -23) suppresses printing of these columns, so you just get the lines unique to file1.
If your lines are unique, do a left join and filter out lines that exist in both tables.
join <(sort file1) <(sort file2) -o0,2.1 -a1 | awk '!$2'

UNIX - Simple merging of two files as in the input

Input File1:
HELLO
HOW
Input File2:
ARE
YOU
output file should be
HELLO
HOW
ARE
YOU
My input files will be in one folder and my script has to fetch the input files from that folder and merge all the files as in the above given order.
Thanks
You can simply use cat as shown below:
cat file1 file2
or, to concatenate all files in a folder (assuming there are not too many):
cat folder/*
sed '' file1 file2
hope this works fine +
cat:
cat file1 file2 >output
perl:
perl -plne '' file1 file2 >output
awk:
awk '1' file1 file2 >output

Resources