Grep needs to match a specific word - shell

I want to use grep in a bash script to get the line with only the word "version" from a file, the problem is that a few lines further down in the file that i want the script to look in there is another line with the word "dlversion".
I am piping the grep into a cut command, the output will be saved as a variable
The problem is that it either saves nothing into the variable or it saves both lines, I've already tried several methods that i found, though none of them have worked.
grep -Fx version /path/to/file.txt | cut -c9-
output = nothing
grep '^version$' /path/to/file.txt | cut -c9-
output = nothing
grep "version" /path/to/file.txt | cut -c9-
output = both lines
also tried
grep -w "version " /path/to/file.txt | cut -c9-
output = nothing
I also tried to use -F, -x on their own which also caused the variable to not have a value.

You have a few options, depending on your version of grep.
If supported, the best option is to use word boundaries \b either side of your word:
grep '\bversion\b' /path/to/file.txt
Or:
grep '\<version\>' /path/to/file.txt
Where \< and \> match the empty string at the start and end of a word respectively.
Otherwise, you can create your own set of characters that you consider to not be a word:
grep -E '(^|[[:space:][:punct:]])version' /path/to/file.txt
This matches "version", preceded by either the start of the line or any type of space or punctuation.
In your specific case, you could use something like this:
grep -E '(^|[^l])version' /path/to/file.txt
This matches "version" preceded by either the start of the line or anything other than an "l".
In response to your comment:
^ matches the start of the line.
| means "or".
[^l] is a bracket expression, where the ^ as the first character means "not" (so this matches every character other than "l").
The parentheses are used to create a group, so that the "or" only applies to this part of the pattern.

In case any sophisticated tricks fail, use the brute force: pipe the result through another grep process to filter out the lines with unwanted words (use -v option for that):
grep 'version' sourcefile | grep -v 'dlversion' > destination

Related

Escape "./" when using sed

I wanted to use grep to exclude words from $lastblock by using a pipeline, but I found that grep works only for files, not for stdout output.
So, here is what I'm using:
lastblock="./2.json"
echo $lastblock | sed '1,/firstmatch/d;/.json/,$d'
I want to exclude ./ and .json, keeping only what is between.
This sed command is correct for this purpose, but how to escape the ./ replacing firstmatch so it can work?
Thanks in advance!
Use bash's Parameter Substitution
lastblock="./2.json"
name="${lastblock##*/}" # strips from the beginning until last / -> 2.json
base="${name%.*}" # strips from the last . to the end -> 2
but I found that grep works only for files, not for stdout output.
here it is. (if your grep supports the -P flag.
lastblock="./2.json"
echo "$lastblock" | grep -Po '(?<=\./).*(?=\.)'
but how to escape the ./
With sed(1), escape it using a back slash \
lastblock="./2.json"
echo "$lastblock" | sed 's/^\.\///;s/\..*$//'
Or use a different delimiter like a pipe |
sed 's|^\./||;s|\..*$||'
with awk
lastblock="./2.json"
echo "$lastblock" | awk -F'[./]+' '{print $2}'
Starting from bashv3, regular expression pattern matching is supported using the =~ operator inside the [[ ... ]] keyword.
lastblock="./2.json"
regex='^\./([[:digit:]]+)\.json'
[[ $lastblock =~ $regex ]] && echo "${BASH_REMATCH[1]}"
Although a P.E. should suffice just for this purpose.
I wanted to use grep to exclude words from $lastblock by using a pipeline, but I found that grep works only for files, not for stdout output.
Nonsense. grep works the same for the same input, regardless of whether it is from a file or from the standard input.
So, here is what I'm using:
lastblock="./2.json"
echo $lastblock | sed '1,/firstmatch/d;/.json/,$d'
I want to exclude ./ and .json, keeping only what is between. This sed
command is correct for this purpose,
That sed command is nowhere near correct for the stated purpose. It has this effect:
delete every line from the very first one up to and including the next subsequent one that matches the regular expression /firstmatch/, AND
delete every line from the first one matching the regular expression /.json/ to the last one of the file (and note that . is a regex metacharacter).
To remove part of a line instead of deleting a whole line, use an s/// command instead of a d command. As for escaping, you can escape a character to sed by preceding it with a backslash (\), which itself must be quoted or escaped to protect it from interpretation by the shell. Additionally, most regex metacharacters lose their special significance when they appear inside a character class, which I find to be a more legible way to include them in a pattern as literals. For example:
lastblock="./2.json"
echo "$lastblock" | sed 's/^[.]\///; s/[.]json$//'
That says to remove the literal characters ./ appearing at the beginning of the (any) line, and, separately, to remove the literal characters .json appearing at the end of the line.
Alternatively, if you want to modify only those lines that both start with ./ and end with .json then you can use a single s command with a capturing group and a backreference:
lastblock="./2.json"
echo "$lastblock" | sed 's/^[.]\/\(.*\)[.]json$/\1/'
That says that on lines that start with ./ and end with .json, capture everything between those two and replace the whole line with the captured part alone.
You can use another character like '#' when you want to avoid slashes.
You can remember a part that matches and use it in the replacement.
Use [.] avoiding the dot to be any character.
echo "$lastblock" | sed -r 's#[.]/(.*)[.]json#\1#'
Solution!
Just discovered today the tr command thanks to this legendary, unrelated answer.
When searching all over Google for how to exclude "." and "/", 100% of StackOverflow answers didn't helped.
So, to escape characters from the output of a command, just append this pipe:
| tr -d "{character-emoji-anything-you-want-to-exclude}"
So, a full working and simple sample:
echo "./2.json" | tr -d "/" | tr -d "." | tr -d "json"
And done!

Clean output using sed

I have a file that begins with this kind of format
INFO|NOT-CLONED|/folder/another-folder/another-folder|last-folder-name|
What I need is to read the file and get this output:
INFO|NOT-CLONED|last-folder-name
I have this so far:
cat clone_them.log | grep 'INFO|NOT-CLONED' | sed -E 's/INFO\|NOT-CLONED\|(.*)/g'
But is not working as intended
NOTE: the last "another-folder" and "last-folder-name is the same
If you want a sed solution:
$ sed -En 's/(INFO\|NOT-CLONED\|).*\|([^|]*)\|$/\1\2/p' file
INFO|NOT-CLONED|last-folder-name
How it works:
-E
Use extended regex
-n
Don't print unless we explicitly tell it to.
s/(INFO\|NOT-CLONED\|).*\|([^|]*)\|$/\1\2/p
Look for lines that include INFO|NOT-CLONED| (save this in group 1) followed by anything, .*, followed by | followed by any characters not |, [^|]* (saved in group 2), followed by | at the end of the line. The replacement text is group 1 followed by group 2.
The p option tells sed to print the line if the match succeeds. Since the substitution only succeeds for lines that contain INFO|NOT-CLONED|, this eliminates the need for an extra grep process.
Variation: Returning just the last-folder-name
To just get the last-folder-name without the INFO|NOT-CLONED, we need only remove \1 from the output:
$ sed -En 's/(INFO\|NOT-CLONED\|).*\|([^|]*)\|$/\2/p' file
last-folder-name
Since we no longer need the first capture group, we could simplify and remove the now unneeded parens so that the only capture group is the last folder name:
$ sed -En 's/INFO\|NOT-CLONED\|.*\|([^|]*)\|$/\1/p' file
last-folder-name
Its simpler in awk as input file is properly delimited by | symbol. You need to tell awk that the input fields are separated by | and output should also remain separated with | symbol using IFS and OFS respectively.
awk 'BEGIN{FS=OFS="|"}/INFO\|NOT-CLONED/{print $1,$2,$(NF-1)}' clone_them.log
INFO|NOT-CLONED|last-folder-name

Read word after a specific word on the same line dont have space between them

How can I extract a word that comes after a specific word in bash ? More precisely, I have a file which has a line which looks like this:
Demo.txt
IN=../files/d
out=../files/d
dataload
name
i want to read "d" from above line.
sed -n '/\/files\// s~.*/files/\([^.]*\)\..*~\1~p' file
this code helping if line having "."
IN=../files/d.txt
so its printing "d"
here we have "d" without "." as end delimeter. So i want to read till end of line.
i/p :
Demo.txt
IN=../files/d
out=../files/d
dataload
name
output looking for:
d
d
code: in bash
You could use GNU grep with PCRE :
grep -oP '/files/\K[^.]+' file
The -P flag makes grep use PCRE, the -o makes it display only the matched part rather than the full line, and the \K in the regex omits what precedes from the displayed matched part.
Alternatively if you don't have access to GNU grep, the following perl command will have the same effect :
perl -nle 'print $& if m{/files/\K[^.]+}' file
Sample run.
This sed variant should work for you:
sed -n '/\/files\// s~.*/files/\([^.]*\).*~\1~p' file
d
d
Minor change from earlier sed is that it doesn't match \. right after first capture group.
When you don't want to think about a single command solution, you can use
grep -Eo "/files/." Demo.txt | cut -d/ -f3

Grep (fgrep) bash exact match end of line

I have the below example file
d41d8cd98f00b204e9800998ecf8427e /home/abid/Testing/FileNamesTest/apersand $ file
d41d8cd98f00b204e9800998ecf8427e /home/abid/Testing/FileNamesTest/file[with square brackets]
d41d8cd98f00b204e9800998ecf8427e /home/abid/Testing/FileNamesTest/~$tempfile
017a3635ccb76250b2036d6aea330c80 /home/abid/Testing/FileNamesTest/FileThree
217a3635ccb76250b2036d6aea330c80 /home/abid/Testing/FileNamesTest/FileThreeDays
d41d8cd98f00b204e9800998ecf8427e /home/abid/Testing/FileNamesTest/single quote's
I want to grep the last part of the file (the file name) but I'm after an exact match for the last part of the line (the file name)
grep FileThree$ files.md5
017a3635ccb76250b2036d6aea330c80 /home/abid/Testing/FileNamesTest/FileThree
gives back an exact match and doesnt find "FileThreeDays" which is what I'm after but because some of the file names contains square brackets it I'm having to use grep -F or fgrep. However using fgrep like the above doesnt work it returns nothing.
How can I exact match the last part of the line using fgrep whilst still honoring the special characters above ~ / $ / ' / [ ] etc...or any other method using maybe awk...
Further....
using fgrep withou return both these files I only want an exact match (using the use of the $ above with grep), but $ with fgrep doesnt return anything.
grep -F FileThree files.md5
017a3635ccb76250b2036d6aea330c80 /home/abid/Testing/FileNamesTest/FileThree
217a3635ccb76250b2036d6aea330c80 /home/abid/Testing/FileNamesTest/FileThreeDays
I can't tell all the details from your question, but it sounds like you can use grep and just escape the special characters: grep 'File\[Three\]Days$'
If you want to use fgrep, though, you can use some tr tricks to help you. If all you want is the filename (without the directory name), you can do something like
cat files.md5 | tr '/' '\n' | fgrep FileThreeDays
That tr command replaces slashes with newlines, so it will put each filename on its own line. That means that fgrep will only find the filename when it searches for FileThreeDays.
If you want the full filename with directory, it's a little trickier, but a similar approach will work. Assuming that there's always a double space between the SHA and the filename, and that there aren't any filenames with double spaces or tab characters in them, you can try something like this:
sed 's/ /\t' files.md5 | tr '\t' '\n' | fgrep FileThreeDays
That sed command converts the double spaces to tabs. The tr command turns those tabs into newlines (the same trick as above).
I would use awk:
awk '{$1="";print}' file
$1="" cuts the first column to an empty string, and print prints the modified line - which only contains the filename now.
However, this leaves a blank space at the start of each line. If you care about it and want to remove it, set the output field separator to an empty string:
awk '{$1="";print}' OFS="" file

awk/sed extract string from between patterns

I know there has probably been a few hundred forms of this question asked on stackoverflow, but I can't seem to find a suitable answer to my question.
I'm trying to parse through the /etc/ldap.conf file on a Linux box so that I can specifically pick out the description fields from between (description= and ):
*-bash-3.2$ grep '^nss_base_passwd' /etc/ldap.conf
nss_base_passwd ou=People,dc=ca,dc=somecompany,dc=com?one?|(description=TD_FI)(description=TD_F6)(description=TD_F6)(description=TRI_142)(description=14_142)(description=REX5)(description=REX5)(description=1950)*
I'm looking to extract these into their own list with no duplicates:
TD_FI
TD_F6
TRI_142
14_142
REX5
1950
(or all on one line with a proper delimiter)
I had played with sed for a few hours but couldn't get it to work - I'm not entirely sure how to use the global option.
You could use grep with -P option,
$ grep '^nss_base_passwd' /etc/ldap.conf | grep -oP '(?<=description\=)[^)]*' | uniq
TD_FI
TD_F6
TRI_142
14_142
REX5
1950
Explanation:
A positive lookbehind is used in grep to print all the characters which was just after to the description= upto the next ) bracket. uniq command is used to remove the duplicates.
perl -nE 'say join(",", /description=\K([^)]+)/g) if /^nss_base_passwd/' /etc/ldap.conf
TD_FI,TD_F6,TD_F6,TRI_142,14_142,REX5,REX5,1950
Try this:
grep '^nss_base_passwd' /etc/ldap.conf |
grep -oE '[(]description=[^)]*' | sort -u |
cut -f2- -d=
Explanations:
With bash, if you end a line with | (or || or &&), the shell knows that the command continues on the next line, so you don't need to use \.
The second grep uses the -o flag to indicate that the matching expressions should be printed out, one per line. It also uses the -E flag to indicate that the pattern is an "Extended" (i.e. normal) regular expression.
Since -o will print the entire match, we need to extract the part after the prefix, for which we use cut, specifying a delimiter of =. -f2- means "all the fields starting with the second field", which we need in case there is an = in the description.
Avinash's answer was very close. Here is my improved version:
grep '^nss_base_passwd' /etc/ldap.conf | grep -Po '\(description=\K[^)]+' | sort -u
There is no need to use lookaround syntax when you can simply use \K (which is actually a shortcut for a corresponding zero-width assertion).
Also, you said that you want NO duplicates, but uniq will only remove duplicate adjacent lines, it will not remove duplicates if there is something in between. That's why I am using sort -u instead.

Resources