Why does awk behave differently in terminal vs in a perl script? - bash

I want to get the first occurance of a file in a directory matching some pattern. This is the command I'm using.
ls my/file/path/pattern* | awk '{print $1}'
Should my directory contain files pattern1, pattern2, pattern3, etc... This command will only return pattern1. This works as expected in the terminal window.
It fails in a perl script. My commands are
push #arr, `ls path/to/my/dir/pattern* | awk \'{print \$1}\'`;
print #arr;
The output here is
pattern1
pattern2
pattern3
I expect the output to only be pattern1. Why is the entirety of the ls's output being dumped to the array?
Edit: I have found a workaround doing
my $temp = (`ls path/to/my/dir/pattern* | awk \'{print \$1}\'`)[0];
But I still am curious why the other way won't work.
Edit: There are a couple of commenters saying my terminal command doesn't work as I describe, so here's a screenshot. I know awk returns the first column of each line, but it should work as I described if only a few files match.

Solved it! When I execute an ls in the term window, the results look like this
file1 file2 file3
file4 file5 file6
but the perl command
my $test = `ls`;
stores the results like this
file1
file2
file3
file4
file5
file6
Since awk '{print $1}' returns matching results in column 1 for each row, running the command
ls file* | awk '{print $1}'
in perl returns all matches.

Related

Extracting lines from 2 files using AWK just return the last match

Im a bit new using AWK and im trying to print lines in a file1 that a specific field exists in a file2. I copied exactly examples that I found here but i dont know why its just printing only the last match of the file1.
File1
58000
72518
94850
File2
58000;123;abc
69982;456;rty
94000;576;ryt
94850;234;wer
84850;576;cvb
72518;345;ert
Result Expected
58000;123;abc
94850;234;wer
72518;345;ert
What Im getting
94850;234;wer
awk -F';' 'NR==FNR{a[$1]++; next} $1 in a' file1 file2
What im doing wrong?
awk (while usable here), isn't the correct tool for the job. grep with the -f option is. The -f file option will read the patterns from file one per-line and search the input file for matches.
So in your case you want:
$ grep -f file1 file2
58000;123;abc
94850;234;wer
72518;345;ert
(note: I removed the trailing '\' from the data file, replace it if it wasn't a typo)
Using awk
If you did want to rewrite what grep is doing using awk, that is fairly simple. Just read the contents of file1 into an array and then for processing records from the second file, just check if field-1 is in the array, if so, print the record (default action), e.g.
$ awk -F';' 'FNR==NR {a[$1]=1; next} $1 in a' file1 file2
58000;123;abc
94850;234;wer
72518;345;ert
(same note about the trailing slash)
Thanks #RavinderSingh13!
The file1 really had some hidden characters and I could see it using cat.
$ cat -v file1
58000^M
72518^M
94850^M
I removed using sed -e "s/\r//g" file1 and the AWK worked perfectly.

Shell script for merging dotenv files with duplicate keys

Given two dotenv files,
# file1
FOO="X"
BAR="B"
and
# file2
FOO="A"
BAZ="C"
I want to run
$ ./merge.sh file1.env file2.env > file3.env
to get the following output:
# file3
FOO="A"
BAR="B"
BAZ="C"
So far, I used the python-dotenv module to parse the files into dictionaries, merge them and write them back. However, I feel like there should be a simple solution in shell that rids myself of a third-party module for such a basic task.
Answer
Alright, so I ended up using
$ sort -u -t '=' -k 1,1 file1 file2 | grep -v '^$\|^\s*\#' > file3
which omits blank lines and comments. Nevertheless, the proposed awk solution works just as fine.
Another quite simple approach is to use sort:
sort -u -t '=' -k 1,1 file1 file2 > file3
results in a file where the keys from file1 take precedence over the keys from file2.
Using a simple awk script:
awk -F= '{a[$1]=$2}END{for(i in a) print i "=" a[i]}' file1 file2
This stores all key values in the array a and prints the array content when both files are parsed.
The keys that are in file2 override the ones in file1.
To add new values only from file2 and NOT overwrite initial values from file1. Omit spaces from file 2.
grep "\S" file2 >> file1
awk -F "=" '!a[$1]++' file1 > file3

Compare 2 csv files and delete rows - Shell

I have a 2 csv files. One has several columns, the other is just one column with domains. Simplified data of these files would be
file1.csv:
John,example.org,MyCompany,Australia
Lenny,domain.com,OtherCompany,US
Martha,site.com,ThirdCompany,US
file2.csv:
example.org
google.es
mysite.uk
The output should be
Lenny,domain.com,OtherCompany,US
Martha,site.com,ThirdCompany,US
I have tried this solution
grep -v -f file2.csv file1.csv >output-file
Found here
http://www.unix.com/shell-programming-and-scripting/177207-removing-duplicate-records-comparing-2-csv-files.html
But since there is no explanation whatsoever about how the script works, and I suck at shell, I cannot tweak it to make it work for me
A solution for this would be highly appreciated, a solution with some explanation would be awesome! :)
EDIT:
I have tried the line that was suppose to work, but for some reason it does not. Here the output from my terminal. What's wrong with this?
Desktop $ cat file1.csv ; echo
John,example.org,MyCompany,Australia
Lenny ,domain.com,OtherCompany,US
Martha,mysite.com,ThirCompany,US
Desktop $ cat file2.csv ; echo
example.org
google.es
mysite.uk
Desktop $ grep -v -f file2.csv file1.csv
John,example.org,MyCompany,Australia
Lenny ,domain.com,OtherCompany,US
Martha,mysite.com,ThirCompany,US
Why grep doesn't remove the line
John,example.org,MyCompany,Australia
The line you posted, works just fine.
$ grep -v -f file2.csv file1.csv
Lenny,domain.com,OtherCompany,US
Martha,site.com,ThirdCompany,US
And here's an explanation. grep will search for a given pattern in a given file and print all lines that match. The simplest example of usage is:
$ grep John file1.csv
John,example.org,MyCompany,Australia
Here we used a simple pattern that matches each character, but you can also use regular expressions (basic, extended, and even perl-compatible ones).
To invert the logic, and print only the lines that do not match, we use the -v switch, like this:
$ grep -v John file1.csv
Lenny,domain.com,OtherCompany,US
Martha,site.com,ThirdCompany,US
To specify more than one pattern, you can use the option -e pattern multiple times, like this:
$ grep -v -e John -e Lenny file1.csv
Martha,site.com,ThirdCompany,US
However, if there is a larger number of patterns to check for, we might use the -f file option that will read all patterns from a file specified.
So, when we combine all of those; reading patterns from a file with -f and inverting the matching logic with -v, we get the line you need.
One in awk:
$ awk -F, 'NR==FNR{a[$1];next}($2 in a==0)' file2 file1
Lenny,domain.com,OtherCompany,US
Martha,site.com,ThirdCompany,US
Explained:
$ awk -F, ' # using awk, comma-separated records
NR==FNR { # process the first file, file2
a[$1] # hash the domain to a
next # proceed to next record
}
($2 in a==0) # process file1, if domain in $2 not in a, print the record
' file2 file1 # file order is important

Unix shell scripting - multiple file concatenate

consider the scenario-
file1:
mike;john;552
mike;mike;555
john;mike;121
file2:
aks;raj;425
man;joe;895
mike;john;552
Assuming file1 and file2 contain the above two sets of data. I would like to put the data from these two files into another file , where the data is uniq ( meaning: file1 and file2 contain common data mike;john;552, but when conbining files i do not want to have duplicates.
I used the command:
cat file1 file2 | sort -u > file3
but this gave me only the common line ie, the duplicate into file3.
Also tried
cat file1 file2 | uniq > file3
Didnt yield required result.
Expected output:
file3:
mike;john;552
mike;mike;555
john;mike;121
aks;raj;425
man;joe;895
Note: the data in file3 can be in any order.
Please help on this.
Your first command works for me and gives the expected output:
$ cat -v file1
mike;john;552
mike;mike;555
john;mike;121
$ cat -v file2
aks;raj;425
man;joe;895
mike;john;552
$ cat file1 file2 | sort -u > file3
$ cat file3
aks;raj;425
john;mike;121
man;joe;895
mike;john;552
mike;mike;555
If this doesn't happen for you, use cat -vE to find out why. Here are two examples:
$ cat file1 file2 | sort -u
foo
foo
$ cat file1 file2 | sort -u | cat -vE
foo$
foo $
In this case, it looks like you get duplicates, but the lines are actually different because of trailing whitespace.
$ cat file1 file2 | sort -u
foo
foo
$ cat file1 file2 | sort -u | cat -vE
foo$
foo^M$
In this case, it also looks like you get duplicates, but one file has carriage returns because it was saved in DOS/Windows mode instead of Unix mode.

Unix: One line bash command to merge 3 files together. extracting only the first line of each

I am having time with my syntax here:
I have 3 files with various content file1 file2 file3 (100+ lines). I am trying to merge them together, but only the first line of each file should be merged. The point is to do it using one line of bash code:
sed -n 1p file1 file2 file3 returns only the first line of file1
You might want to try
head -n1 -q file1 file2 file3.
It's not clear if by merge you mean concatenate or join?
In awk by joining (each first line in the files printed side by side):
$ awk 'FNR==1{printf "%s ",$0}' file1 file2 file3
1 2 3
In awk by concatenating (each first line in the files printed one after another):
$ awk 'FNR==1' file1 file2 file3
1
2
3
I suggest you use head as explained by themel's answer. However, if you insist in using sed you cannot simply pass all files to it, since they are implicitly concatenated and you lose information about what the first line is in each file respectively. So, if you really want to do it in sed, you need bash to help you out:
for f in file1 file2 file3; do sed -n 1p "$f"; done
You can avoid calling external processes by using the read built-in command:
for f in file1 file2 file3; do read l < $f; echo "$l"; done > merged.txt

Resources