Sorry title of this question is little confusing but I couldnt think of anything else.
I am trying to do something like this
cat fileA.txt | grep `awk '{print $1}'` fileB.txt
fileA contains 100 lines while fileB contains 100 million lines.
What I want is get id from fileA, grep that id in a different file-fileB and print that line.
e.g fileA.txt
1234
1233
e.g.fileB.txt
1234|asdf|2012-12-12
5555|asdd|2012-11-12
1233|fvdf|2012-12-11
Expected output is
1234|asdf|2012-12-12
1233|fvdf|2012-12-11
Getting rid of cat and awk altogether:
grep -f fileA.txt fileB.txt
awk alone can do that job well:
awk -F'|' 'NR==FNR{a[$0];next;}$1 in a' fileA fileB
see the test:
kent$ head a b
==> a <==
1234
1233
==> b <==
1234|asdf|2012-12-12
5555|asdd|2012-11-12
1233|fvdf|2012-12-11
kent$ awk -F'|' 'NR==FNR{a[$0];next;}$1 in a' a b
1234|asdf|2012-12-12
1233|fvdf|2012-12-11
EDIT
add explanation:
-F'|' #| as field separator (fileA)
'NR==FNR{a[$0];next;} #save lines in fileA in array a
$1 in a #if $1(the 1st field) in fileB in array a, print the current line from FileB
for further details I cannot explain here, sorry. for example how awk handle two files, what is NR and what is FNR.. I suggest that try this awk line in case the accepted answer didn't work for you. If you want to dig a little bit deeper, read some awk tutorials.
If the id's are on distinct lines you could use the -f option in grep as such:
cut -d "|" -f1 < fileB.txt | grep -F -f fileA.txt
The cut command will ensure that only the first field is searched for in the pattern searching using grep.
From the man page:
-f FILE, --file=FILE
Obtain patterns from FILE, one per line.
The empty file contains zero patterns, and therefore matches nothing.
(-f is specified by POSIX.)
Related
File1
Ada
Billy
Charles
Delta
Eight
File2
Ada,User,xxx
Beba,User,xxx
Charles,Admin,xxx
I am exuting the following
Acc=`cut -d',' -f1 $PATH/File2
for account in `cat $File1 |grep -v Acc`
do
cat......
sed....
How to correct this?>
Expect output
Check file2 account existing on file1
Ada
Charles
This awk should work for you:
awk -F, 'FNR == NR {seen[$1]; next} $1 in seen' file2 file1
Ada
Charles
If this is not the output you're looking for then edit your question and add your expected output.
Your grep command searches for files which do not contain the string Acc. You need the Flag -f, which causes grep to accept a list of pattern from a file, something like this:
tmpf=/tmp/$$
cut -d',' -f1 File2 >$tmpf
for account in $(grep -f "$tmpf" File1)
do
...
done
I've got two files. File A contains text written in N lines, and File B contains a binary pattern string of 0 and 1 that has N length too.
I want to delete the lines from File A that has the same line number that the one on File B that contains a 0.
I've read that it might be a good idea to do it with awk, but I don't have any idea of how to use it.
Files are very long, like 2000 lines for example (they are video traces)
For example:
File A:
Line 1: 123456
Line 2: 789012
Line 3: 345678
Line 4: 901234
File B:
Line 1: 1
Line 2: 0
Line 3: 0
Line 4: 1
After the execution:
File A:
Line 1: 123456
Line 2: 901234
You can use paste and cut for this:
paste fileB fileA | grep '^1' | cut -f2-
paste fileB fileA - pastes file contents side by side, delimited by a tab
grep '^1' - filters that lines that start with 1
cut -f2- - extracts the content that we need
Both cut and paste use tab as the default delimiter.
This is very similar to Benjamin's solution. A small advantage here is that it would work even if fileA were to have more than one field per line.
Assuming Line 1: etc don't really exist in your input files all you need is:
awk 'NR==FNR{a[NR]=$0;next} a[FNR]' fileB fileA
You could use a decorate – filter – undecorate pattern:
paste fileA fileB | grep -v '0$' | cut -f1
This prints the lines of each file next to each other (paste), then filters the lines that end with 0 (grep), then removes the lines from the second file (cut).
This breaks if fileA contains the delimiter used for paste and cut (a tab by default). To avoid that, we could either swap the files (see codeforester's answer) or resort to something like
paste fileA fileB | sed -n '/1$/s/\t.$//p'
(if line ends with 1, remove tab and last character, then print) or
paste fileA fileB | grep -Po '.*(?=\t1$)'
(match only lines ending in 1, use zero-width look-ahead to exclude tab and 1 from match); the last solution requires a grep that supports Perl compatible regular expressions (PCRE) such as GNU grep.
Lots of interesting answers here. Here's a bash one:
while IFS= read -r -u3 line; IFS= read -r -u4 bool; do
((bool == 1)) && printf "%s\n" "$line"
done 3<fileA 4<fileB
This will be much slower than other solutions.
another paste/awk solution. If tab appears in data find another delimiter.
paste file2 file1 | awk -F'\t' '$1{print $2}'
A single awk command can read from both files.
awk '(getline flag < "fileB") > 0 && flag' fileA
After reading each line from fileA, read a line from fileB into a variable flag and test if its integer value is true or not. For true values, the line from fileA is printed.
Depending on your version of awk, you may need to use int(flag) or flag+0 to force the value to be treated as an integer rather than an ordinary non-empty string.
EDIT: #codeforester's comment if Line 1 or Line 2 are not part of your File1 and File2 then following may help.
awk 'FNR==NR{a[FNR]=$0;next} $0!=0{print a[FNR]}' filea fileb
Solution 2nd: Reading fileb file first and then reading filea then.
awk 'FNR==NR{if($0!=0){a[FNR]=$0};next} a[FNR]' fileb filea
Solution 1st's alternative in case OP has string(s) line1, line2 in his/her files.
Following awk may help here too.
awk '
FNR==NR{
a[FNR]=$NF;
next}
$NF!=0{
printf("%s%s\n","Line " ++count": ",a[FNR])
}' filea fileb
paste and sed combo:
paste -d'\n' fileB fileA | sed -n '/^1$/{n;p}'
123456
901234
You interleave the files:
1
123456
0
789012
0
345678
1
901234
Then you use sed to print the lines that follow directly a line that has only a 1. However this will not behave properly if you have entries that are composed only of a 1 in the fileA. If it is the case then you have to use the following sed command that takes into account if we are currently processing an odd/even line:
paste -d'\n' fileB fileA | sed -n '1~2{/^1$/{n;p}}'
I have a 2 csv files. One has several columns, the other is just one column with domains. Simplified data of these files would be
file1.csv:
John,example.org,MyCompany,Australia
Lenny,domain.com,OtherCompany,US
Martha,site.com,ThirdCompany,US
file2.csv:
example.org
google.es
mysite.uk
The output should be
Lenny,domain.com,OtherCompany,US
Martha,site.com,ThirdCompany,US
I have tried this solution
grep -v -f file2.csv file1.csv >output-file
Found here
http://www.unix.com/shell-programming-and-scripting/177207-removing-duplicate-records-comparing-2-csv-files.html
But since there is no explanation whatsoever about how the script works, and I suck at shell, I cannot tweak it to make it work for me
A solution for this would be highly appreciated, a solution with some explanation would be awesome! :)
EDIT:
I have tried the line that was suppose to work, but for some reason it does not. Here the output from my terminal. What's wrong with this?
Desktop $ cat file1.csv ; echo
John,example.org,MyCompany,Australia
Lenny ,domain.com,OtherCompany,US
Martha,mysite.com,ThirCompany,US
Desktop $ cat file2.csv ; echo
example.org
google.es
mysite.uk
Desktop $ grep -v -f file2.csv file1.csv
John,example.org,MyCompany,Australia
Lenny ,domain.com,OtherCompany,US
Martha,mysite.com,ThirCompany,US
Why grep doesn't remove the line
John,example.org,MyCompany,Australia
The line you posted, works just fine.
$ grep -v -f file2.csv file1.csv
Lenny,domain.com,OtherCompany,US
Martha,site.com,ThirdCompany,US
And here's an explanation. grep will search for a given pattern in a given file and print all lines that match. The simplest example of usage is:
$ grep John file1.csv
John,example.org,MyCompany,Australia
Here we used a simple pattern that matches each character, but you can also use regular expressions (basic, extended, and even perl-compatible ones).
To invert the logic, and print only the lines that do not match, we use the -v switch, like this:
$ grep -v John file1.csv
Lenny,domain.com,OtherCompany,US
Martha,site.com,ThirdCompany,US
To specify more than one pattern, you can use the option -e pattern multiple times, like this:
$ grep -v -e John -e Lenny file1.csv
Martha,site.com,ThirdCompany,US
However, if there is a larger number of patterns to check for, we might use the -f file option that will read all patterns from a file specified.
So, when we combine all of those; reading patterns from a file with -f and inverting the matching logic with -v, we get the line you need.
One in awk:
$ awk -F, 'NR==FNR{a[$1];next}($2 in a==0)' file2 file1
Lenny,domain.com,OtherCompany,US
Martha,site.com,ThirdCompany,US
Explained:
$ awk -F, ' # using awk, comma-separated records
NR==FNR { # process the first file, file2
a[$1] # hash the domain to a
next # proceed to next record
}
($2 in a==0) # process file1, if domain in $2 not in a, print the record
' file2 file1 # file order is important
I have two files, file1.txt and file2.txt. Each has an identical number of lines, but some of the lines in file1.txt are empty. This is easiest to see when the content of the two files is displayed in parallel:
file1.txt file2.txt
cat bear
fish eagle
spider leopard
snail
catfish rainbow trout
snake
koala
rabbit fish
I need to assemble these files together, such that the empty lines in file1.txt are filled with the data found in the lines (of the same line number) from file2.txt. The result in file3.txt would look like this:
cat
fish
spider
snail
catfish
snake
koala
rabbit
The best I can do so far, is create a while read -r line loop, create a counter that counts how many times the while loop has looped, then use an if-conditional to check if $line is empty, then use cut to obtain the line number from file2.txt according to the number on the counter. This method seems really inefficient.
Sometimes file2.txt might contain some empty lines. If file1.txt has an empty line and file2.txt also has an empty line in the same place, the result is an empty line in file3.txt.
How can I fill the empty lines in one file with corresponding lines from another file?
paste file1.txt file2.txt | awk -F '\t' '$1 { print $1 ; next } { print $2 }'
Here is the way to handle these files with awk:
awk 'FNR==NR {a[NR]=$0;next} {print (NF?$0:a[FNR])}' file2 file1
cat
fish
spider
snail
catfish
snake
koala
rabbit
First it store every data of the file2 in array a using record number as index
Then it prints file1, bit it thest if file1 contains data for each record
If there is data for this record, then use it, if not get one from file2
One with getline (harmless in this case) :
awk '{getline p<f; print NF?$0:p; p=x}' f=file2 file1
Just for fun:
paste file1.txt file2.txt | sed -E 's/^ //g' | cut -f1
This deletes tabs that are at the beginning of a line (those missing from file1) and then takes the first column.
(For OSX, \t doesn't work in sed, so to get the TAB character, you type ctrl-V then Tab)
a solution without awk :
paste -d"#" file1 file2 | sed 's/^#\(.*\)/\1/' | cut -d"#" -f1
Here is a Bash only solution.
for i in 1 2; do
while read line; do
if [ $i -eq 1 ]; then
arr1+=("$line")
else
arr2+=("$line")
fi
done < file${i}.txt
done
for r in ${!arr1[#]}; do
if [[ -n ${arr1[$r]} ]]; then
echo ${arr1[$r]}
else
echo ${arr2[$r]}
fi
done > file3.txt
I would like to have a shell script that searches two files and returns a list of strings:
File A contains just a list of unique alphanumeric strings, one per line, like this:
accc_34343
GH_HF_223232
cwww_34343
jej_222
File B contains a list of SOME of those strings (some times more than once), and a second column of infomation, like this:
accc_34343 dog
accc_34343 cat
jej_222 cat
jej_222 horse
I would like to create a third file that contains a list of the strings from File A that are NOT in File B.
I've tried using some loops with grep -v, but that doesn't work. So, in the above example, the new file would have this as it's contents:
GH_HF_223232
cwww_34343
Any help is greatly appreciated!
Here's what you can do:
grep -v -f <(awk '{print $1}' file_b) file_a > file_c
Explanation:
grep -v : Use -v option to grep to invert the matching
-f : Use -f option to grep to specify that the patterns are from file
<(awk '{print $1}' file_b): The <(awk '{print $1}' file_b) is to simply extract the first column values from file_b without using a temp file; the <( ... ) syntax is process substitution.
file_a : Tell grep that the file to be searched is file_a
> file_c : Output to be written to file_c
comm is used to find intersections and differences between files:
comm -23 <(sort fileA) <(cut -d' ' -f1 fileB | sort -u)
result:
GH_HF_223232
cwww_34343
I assume your shell is bash/zsh/ksh
awk 'FNR==NR{a[$0];next}!($1 in a)' fileA fileB
check here