grep matching specific position in lines using words from other file - shell

I have 2 file
file1:
12342015010198765hello
12342015010188765hello
12342015010178765hello
whose each line contains fields at fixed positions, for example, position 13 - 17 is for account_id
file2:
98765
88765
which contains a list of account_ids.
In Korn Shell, I want to print lines from file1 whose position 13 - 17 match one of account_id in file2.
I can't do
grep -f file2 file1
because account_id in file2 can match other fields at other positions.
I have tried using pattern in file2:
^.{12}98765.*
but did not work.

Using awk
$ awk 'NR==FNR{a[$1]=1;next;} substr($0,13,5) in a' file2 file1
12342015010198765hello
12342015010188765hello
How it works
NR==FNR{a[$1]=1;next;}
FNR is the number of lines read so far from the current file and NR is the total number of lines read so far. Thus, if FNR==NR, we are reading the first file which is file2.
Each ID in in file2 is saved in array a. Then, we skip the rest of the commands and jump to the next line.
substr($0,13,5) in a
If we reach this command, we are working on the second file, file1.
This condition is true if the 5 character long substring that starts at position 13 is in array a. If the condition is true, then awk performs the default action which is to print the line.
Using grep
You mentioned trying
grep '^.{12}98765.*' file2
That uses extended regex syntax which means that -E is required. Also, there is no value in matching .* at the end: it will always match. Thus, try:
$ grep -E '^.{12}98765' file1
12342015010198765hello
To get both lines:
$ grep -E '^.{12}[89]8765' file1
12342015010198765hello
12342015010188765hello
This works because [89]8765 just happens to match the IDs of interest in file2. The awk solution, of course, provides more flexibility in what IDs to match.

Using sed with extended regex:
sed -r 's#.*#/^.{12}&/p#' file2 |sed -nr -f- file1
Using Basic regex:
sed 's#.*#/^.\\{12\\}&/p#' file1 |sed -n -f- file
Explanation:
sed -r 's#.*#/^.{12}&/p#' file2
will generate an output:
/.{12}98765/p
/.{12}88765/p
which is then used as a sed script for the next sed after pipe, which outputs:
12342015010198765hello
12342015010188765hello

Using Grep
The most convenient is to put each alternative in a separate line of the file.
You can look at this question:
grep multiple patterns single file argument list too long

Related

Diff to get changed line from second file

I have two files file1 and file2. I want to print the new line added to file2 using diff.
file1
/root/a
/root/b
/root/c
/root/d
file2
/root/new
/root/new_new
/root/a
/root/b
/root/c
/root/d
Expected output
/root/new
/root/new_new
I looked into man page but there was no any info on this
If you don't need to preserve the order, you could use the comm command like:
comm -13 <(sort file1) <(sort file2)
comm compares 2 sorted files and will print 3 columns of output. First is the lines unique to file1, then lines unique to file2 then lines common to both. You can supress any columns, so we turn of 1 and 3 in this example with -13 so we will see only lines unique to the second file.
or you could use grep:
grep -wvFf file1 file2
Here we use -f to have grep get its patterns from file1. We then tell it to treat them as fixed strings with -F instead of as patterns, match whole words with -w, and print only lines with no matches with -v
Following awk may help you on same. This will tell you all those lines which are present in Input_file2 and not in Input_file1.
awk 'FNR==NR{a[$0];next} !($0 in a)' Input_file1 Input_file2
Try using a combination of diff and sed.
The raw diff output is:
$ diff file1 file2
0a1,2
> /root/new
> /root/new_new
Add sed to strip out everything but the lines beginning with ">":
$ diff file1 file2 | sed -n -e 's/^> //p'
/root/new
/root/new_new
This preserves the order. Note that it also assumes you are only adding lines to the second file.

replace different text in different lines using sed

I need to do the following:
I have two files, the first one contains only the lines that are going to be modified:
1
2
3
and the second contains the text that is going to be replaced in original file (final_output.txt)
13e
19f
16a
the original file is
wire1: 0x'd318
wire2: 0x'd415
wire3: 0x'd362
I want to get the following:
wire1: 0x13e
wire2: 0x19f
wire3: 0x16a
This is only a part of final_output.txt, because the file can contain at least 100 lines, and I pretend to do it using for, but I don't know how to implement it
awk to the rescue!
assuming the part after the single quote will be replaced.
$ awk -v q="'" 'NR==FNR {a[$1]=$2;next}
FNR in a {sub(q".*",a[FNR])}1' <(paste index rep) file
index is the index file, rep is the replacement file, and file is the original data file.
Another solution where file1 contains only the lines, file2 contains the text that is going to be replaced in original file and final_output.txt contains your original text.
for ((i=1;i<=$(wc -l < file1);i++)); do sed -i "$(sed -n "${i}p" file1)s#$(sed -n "$(sed -n "${i}p" file1)p" final_output.txt | grep -oP "'.*")#$(sed -n "${i}p" file2)#g" final_output.txt; done
Output
darby#Debian:~/Scrivania$ cat final_output.txt
wire1: 0x13e
wire2: 0x19f
wire3: 0x16a
darby#Debian:~/Scrivania$

Bash - reading two files and searching within files

I have two files, file1 and file2. I want to reach each line from file1, and then search if any of the lines in file2 is present in file1. I am using the following bash script, but it does not seem to be working. What should I change? (I am new to bash scripting).
#!/bin/bash
while read line1
do
echo $line1
while read line2
do
if grep -Fxq "line2" "$1"
then
echo "found"
fi
done < "$2"
done < "$1"
Note: Both files are text files.
Use grep -f
grep -f file_with_search_words file_with_content
Note however that if file_with_search_words contains blank lines everything will be matched. But that can be easily avoided with:
grep -f <(sed '/^$/d' file_with_search_words) file_with_content
From the man page:
-f FILE, --file=FILE
Obtain patterns from FILE, one per line. If this option is used
multiple times or is combined with the -e (--regexp) option, search
for all patterns given. The empty file contains zero patterns, and
therefore matches nothing.
You may use the command "comm", it compare two sorted files line-by-line
This command show the common lines in file1 and file2
comm -12 file1 file2
The only problem with this command is that you have to sort the files before, like this:
sort file1 > file1sorted
http://www.computerhope.com/unix/ucomm.htm
File 1
Line 1
Line 3
Line 6
Line 9
File 2
Line 3
Line 6
awk 'NR==FNR{con[$0];next} $0 in con{print $0}' file1 file2
will give you
Line 3
Line 6
that is the content in file 2 which is present in file1.
If you wish to ignore the spaces you can achieve with the below one.
awk 'NR==FNR{con[$0];next} !/^$/{$0 in con;print $0}' file1 file2

Using cut and grep commands in unix

I have a file (file1.txt) with text as:
aaa,,,,,
aaa,10001781,,,,
aaa,10001782,,,,
bbb,10001783,,,,
My file2 contents are:
11111111
10001781
11111222
I need to search second field of file1 in file2 and delete the line from file1 if pattern is matching.So output will be:
aaa,,,,,
aaa,10001782,,,,
bbb,10001783,,,,
Can I use grep and cut commands for this?
This prints lines from file1.txt only if the second field is not in file2:
$ awk -F, 'FNR==NR{a[$1]=1; next;} !a[$2]' file2 file1.txt
aaa,,,,,
aaa,10001782,,,,
bbb,10001783,,,,
How it works
This works by reading file2 and keeping track of all lines seen in an associative array a. Then, lines in file1.txt are printed only if its column 2 is not in a. In more detail:
FNR==NR{a[$1]=1; next;}
When reading file2, set a[$1] to 1 to signal that we have seen the value on this line. We then instruct awk to skip the rest of the commands and start over on the next line.
This section is only run for file2 because file2 is listed first on the command line and FNR==NR only when we are reading the first file listed on the command line. This is because FNR is the number of lines read from the current file and NR is the total number of lines read so far. These two are equal only for the first file.
!a[$2]
When reading file1.txt, a[$2] evaluates to true if column 2 was seen in file2. Since ! is negation, !a[$2] evaluates to true when column 2 was not seen. When this evaluates to true, the line is printed.
Alternative
This is the same logic, expressed in a slightly different style, as suggested in the comments by Tom Fenech:
$ awk -F, 'FNR==NR{a[$1]; next;} !($2 in a)' file2 file1.txt
aaa,,,,,
aaa,10001782,,,,
bbb,10001783,,,,
Soulution with grep
$ grep -vf file2 file1.txt
aaa,,,,,
aaa,10001782,,,,
bbb,10001783,,,,
John1024's awk soulution would be faster for large files though.

Compare two files,delete a line if matches found

I want to compare two files.
If values from file2 are matching with the first two columns of file1 need to delete the whole line from file1 and print the result into output as shown below.
Below contains values of file1:
1,aplle,melle,cyborg
2,bplle,less,vgm
3,minipl,vicy,bgm
4,tag,mob,calic
6,Centurion,sa,hh
Below contains values of file2
2,bplle
4,tag
5,Centurion
And output must contains below:
1,aplle,melle,cyborg
3,minipl,vicy,bgm
6,Centurion,sa,hh
Is it possible to achieve this awk ?
This awk should work:
awk -F, 'FNR==NR{a[$1,$2];next} !(($1,$2) in a)' file2 file1
1,aplle,melle,cyborg
3,minipl,vicy,bgm
6,Centurion,sa,hh
This would also work: grep -Fwvf file2 file1
-F
Interpret PATTERN as a list of fixed strings,
-w
Select only those lines containing matches that form whole words.
-v
Invert the sense of matching, to select non-matching lines.
-f FILE
Obtain patterns from FILE, one per line.

Resources