grep command to know whther the two strings are in specific order - bash

I was trying to write a shell script to check if two strings are present in a file, also, I'm checking if they are in specific order.
Let's say the file.txt has the following text:
bcd
def
abc
I'm using the command : grep -q abc file.txt && grep -l bcd file.txt
This is giving the output file.txt when the two strings are present in any order. I'd like to get the output only if abc comes before bcd. Please help me with this

With grep PCRE option:
grep -Pzl 'abc[\s\S]*bcd' file.txt
-z - treat input and output data as sequences of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline.
If PCRE (-P option) is not supported on your side:
grep -zl 'abc.*bcd' file.txt

You can use awk instead of grep to match abc only after bcd:
awk '/abc/{p=NR} p && /bcd/{print FILENAME; exit}' file

awk -v RS='' '/abc.*bcd/{print FILENAME}' file.txt
You may re-assign the RS (record separator) from default '\n' to '', and start to process the whole file as it is in one record. Then it's no problem to use /abc.*bcd/ to distinguish if abc is ahead bcd.
Noted that it would not be recognized successfully if an empty line is in the case, since an empty line between abc and bcd would split them to different records. That would cause the criterion misjudge.

Related

Duplicate entries in file

I have a file with content as below,
123 ABC
12345 ABC-test
In the shell script, I need an exact entry instead of two duplicate results, but unable to get the exact entry.
For example:
grep "ABC"
returns both the entries, but I want a specific entry, i.e., if I search for "ABC", I should get only "123 ABC" and not the other entry.
Since you consider words to be whitespace-separated chunks, it is easier to use awk here since it reads lines (records) and splits them into fields (non-whitespace chunks) by default:
awk '$2=="ABC"' file > newfile
awk '/([[:space:]]|^)ABC([[:space:]]|$)/' file > newfile
Here, the first awk will output all lines where the second word is ABC. The second awk outputs all lines with ABC followed/preceded with a whitespace or at start/end of the line.
See the online demo:
#!/bin/bash
s='123 ABC
12345 ABC-test'
awk '$2=="ABC"' <<< "$s"
awk '/([[:space:]]|^)ABC([[:space:]]|$)/' <<< "$s"
Output:
123 ABC
You have to forge proper regex (regular expression) - in this case you want only those lines, where ABC is not surrounded by other characters (is on boundaries):
grep -e '\bABC\b'
should do the work. -e switch enables extended regular expressions in grep. Check also some regex tutorials, i.e. https://www.regular-expressions.info/tutorial.html.

Making bash output a certain word from a .txt file

I have a question on Bash:
Like the title says, I require bash to output a certain word, depending on where it is in the file. In my explicit example I have a simple .txt file.
I already found out that you can count the number of words within a file with the command:
wc -w < myFile.txt
An output example would be:
78501
There certainly is also a way to make "cat" to only show word number x. Something like:
cat myFile.txt | wordno. 3125
desired-word
Notice, that I will welcome any command, that gets this done, not only cat.
Alternatively or in addition, I would be happy to know how you can make certain characters in a file show, based on their place in it. Something like:
cat myFile.txt | characterno. 2342
desired-character
I already know how you can achieve this with a variable:
a="hello, how are you"
echo ${a:9:1}
w
Only problem is a variable can only be so long. Is it as long as a whole .txt file, it won't work.
I look forward to your answers!
You could use awkfor this job it splits the string at spaces and prints the $wordnumber stringpart and tr is used to remove newlines
cat myFile.txt | tr -d '\n' | awk -v wordnumber=5 '{ print $wordnumber }'
And if you want the for example 5th. character you could do this like so
head -c 5 myFile.txt | tail -c 1
Since you have NOT shown samples of Input_file or expected output so couldn't test it. You could simply do this with awk as follows could be an example.
awk 'FNR==1{print substr($0,2342,1);next}' Input_file
Where we are telling awk to look for 1st line FNR==1 and in substr where we tell awk to take character 2342 and next 1 means from that position take only 1 character you could increase its value or keep it as per your need too.
With gawk:
awk 'BEGIN{RS="[[:space:]]+"} NR==12345' file
or
gawk 'NR==12345' RS="[[:space:]]+" file
I'm setting the record separator to a sequences of spaces which includes newlines and then print the 12345th record.
To improve the average performance you can exit the script once the match is found:
gawk 'BEGIN{RS="[[:space:]]+"}NR==12345{print;exit}' file

searching/extracting specific word not the complete line in unix

I would like to search a word in a file in Unix that should return only the word not the complete line.
For ex:
Sample.text:
Hello abc hi aeabcft 123abc OK
Expected output:
abc
aeabcft
123abc
If I search abc in file Sample.txt using grep, it will return complete line but I want the words that contains abc
You can use grep -Eo with an enhanced regex to search all matching words
grep -Eo '\b[[:alnum:]]*abc[[:alnum:]]*\b' Sample.text
abc
aeabcft
123abc
As per man grep:
-o, --only-matching
Prints only the matching part of the lines.
If you need to also do other processing to the file that grep can't accomplish, you could use Awk for printing only the regex match on the line.
awk -v r="abc" '{m=match($0,r,a)}m{print a[0]}' file
Otherwise I'd just use anubhava's grep -o suggestion as it's shorter and clearer.

Bash - remove all lines beginning with 'P'

I have a text file that's about 300KB in size. I want to remove all lines from this file that begin with the letter "P". This is what I've been using:
> cat file.txt | egrep -v P*
That isn't outputting to console. I can use cat on the file without another other commands and it prints out fine. My final intention being to:
> cat file.txt | egrep -v P* > new.txt
No error appears, it just doesn't print anything out and if I run the 2nd command, new.txt is empty.
I should say I'm running Windows 7 with Cygwin installed.
Explanation
use ^ to anchor your pattern to the beginning of the line ;
delete lines matching the pattern using sed and the d flag.
Solution #1
cat file.txt | sed '/^P/d'
Better solution
Use sed-only:
sed '/^P/d' file.txt > new.txt
With awk:
awk '!/^P/' file.txt
Explanation
The condition starts with an ! (negation), that negates the following pattern ;
/^P/ means "match all lines starting with a capital P",
So, the pattern is negated to "ignore lines starting with a capital P".
Finally, it leverage awk's behavior when { … } (action block) is missing, that is to print the record validating the condition.
So, to rephrase, it ignores lines starting with a capital P and print everything else.
Note
sed is line oriented and awk column oriented. For your case you should use the first one, see Edouard Lopez's reponse.
Use sed with inplace substitution (for GNU sed, will also for your cygwin)
sed -i '/^P/d' file.txt
BSD (Mac) sed
sed -i '' '/^P/d' file.txt
Use start of line mark and quotes:
cat file.txt | egrep -v '^P.*'
P* means P zero or more times so together with -v gives you no lines
^P.* means start of line, then P, and any char zero or more times
Quoting is needed to prevent shell expansion.
This can be shortened to
egrep -v ^P file.txt
because .* is not needed, therefore quoting is not needed and egrep can read data from file.
As we don't use extended regular expressions grep will also work fine
grep -v ^P file.txt
Finally
grep -v ^P file.txt > new.txt
This works:
cat file.txt | egrep -v -e '^P'
-e indicates expression.

How to ignore all lines before a match occurs in bash?

I would like ignore all lines which occur before a match in bash (also ignoring the matched line. Example of input could be
R1-01.sql
R1-02.sql
R1-03.sql
R1-04.sql
R2-01.sql
R2-02.sql
R2-03.sql
and if I match R2-01.sql in this already sorted input I would like to get
R2-02.sql
R2-03.sql
Many ways possible. For example: assuming that your input is in list.txt
PATTERN="R2-01.sql"
sed "0,/$PATTERN/d" <list.txt
because, the 0,/pattern/ works only on GNU sed, (e.g. doesn't works on OS X), here is an tampered solution. ;)
PATTERN="R2-01.sql"
(echo "dummy-line-to-the-start" ; cat - ) < list.txt | sed "1,/$PATTERN/d"
This will add one dummy line to the start, so the real pattern must be on line the 1 or higher, so the 1,/pattern/ will works - deleting everything from the line 1 (dummy one) up to the pattern.
Or you can print lines after the pattern and delete the 1st, like:
sed -n '/pattern/,$p' < list.txt | sed '1d'
with awk, e.g.:
awk '/pattern/,0{if (!/pattern/)print}' < list.txt
or, my favorite use the next perl command:
perl -ne 'print unless 1../pattern/' < list.txt
deletes the 1.st line when the pattern is on 1st line...
another solution is reverse-delete-reverse
tail -r < list.txt | sed '/pattern/,$d' | tail -r
if you have the tac command use it instead of tail -r The interesant thing is than the /pattern/,$d' works on the last line but the1,/pattern/d` doesn't on the first.
How to ignore all lines before a match occurs in bash?
The question headline and your example don't quite match up.
Print all lines from "R2-01.sql" in sed:
sed -n '/R2-01.sql/,$p' input_file.txt
Where:
-n suppresses printing the pattern space to stdout
/ starts and ends the pattern to match (regular expression)
, separates the start of the range from the end
$ addresses the last line in the input
p echoes the pattern space in that range to stdout
input_file.txt is the input file
Print all lines after "R2-01.sql" in sed:
sed '1,/R2-01.sql/d' input_file.txt
1 addresses the first line of the input
, separates the start of the range from the end
/ starts and ends the pattern to match (regular expression)
$ addresses the last line in the input
d deletes the pattern space in that range
input_file.txt is the input file
Everything not deleted is echoed to stdout.
This is a little hacky, but it's easy to remember for quickly getting the output you need:
$ grep -A99999 $match $file
Obviously you need to pick a value for -A that's large enough to match all contents; if you use a too-small value the output will be silently truncated.
To ensure you get all output you can do:
$ grep -A$(wc -l $file) $match $file
Of course at that point you might be better off with the sed solutions, since they don't require two reads of the file.
And if you don't want the matching line itself, you can simply pipe this command into tail -n+1 to skip the first line of output.
awk -v pattern=R2-01.sql '
print_it {print}
$0 ~ pattern {print_it = 1}
'
you can do with this,but i think jomo666's answer was better.
sed -nr '/R2-01.sql/,${/R2-01/d;p}' <<END
R1-01.sql
R1-02.sql
R1-03.sql
R1-04.sql
R2-01.sql
R2-02.sql
R2-03.sql
END
Perl is another option:
perl -ne 'if ($f){print} elsif (/R2-01\.sql/){$f++}' sql
To pass in the regex as an argument, use -s to enable a simple argument parser
perl -sne 'if ($f){print} elsif (/$r/){$f++}' -- -r=R2-01\\.sql file
This can be accomplished with grep, by printing a large enough context following the $match. This example will output the first matching line followed by 999,999 lines of "context".
grep -A999999 $match $file
For added safety (in case the $match begins with a hyphen, say) you should use -e to force $match to be used as an expression.
grep -A999999 -e '$match' $file

Resources