Why is this grep group failing - macos

I am trying to do something like this on my OSX terminal
> grep -i "((\D*)ful)" ./Myfile.rtf
The above statement fails however when I do this
> grep -i "\D*ful" ./Myfile.rtf
it passes - does grep have an issue with regex groups

Since basic grep uses BRE, you need to use \(..\) for capturing group.
grep -i "\(\(\D*\)ful\)" ./Myfile.rtf

The most likely problem when this sort of thing happens is that the special characters are or are not special. In this case, I think the brackets are not special unless you quote them, so:
> grep -i "\(\(\D*\)ful\)" ./Myfile.rtf
would probably work better.
[One of the irritations of regex is the variation that has developed in exactly how they are written...]

Related

How to append the specific path in the given file list and update the filelist [duplicate]

I have a file r. I want to replace the words File and MINvac.pdb in it with nothing. The commands I used are
sed -i 's/File//g' /home/kanika/standard_minimizer_prosee/r
and
sed -i 's/MINvac.pdb//g' /home/kanika/standard_minimizer_prosee/r
I want to combine both sed commands into one, but I don't know the way. Can anyone help?
The file looks like this:
-6174.27 File10MINvac.pdb
-514.451 File11MINvac.pdb
4065.68 File12MINvac.pdb
-4708.64 File13MINvac.pdb
6674.54 File14MINvac.pdb
8563.58 File15MINvac.pdb
sed is a scripting language. You separate commands with semicolon or newline. Many sed dialects also allow you to pass each command as a separate -e option argument.
sed -i 's/File//g;s/MINvac\.pdb//g' /home/kanika/standard_minimizer_prosee/r
I also added a backslash to properly quote the literal dot before pdb, but in this limited context that is probably unimportant.
For completeness, here is the newline variant. Many newcomers are baffled that the shell allows literal newlines in quoted strings, but it can be convenient.
sed -i 's/File//g
s/MINvac\.pdb//g' /home/kanika/standard_minimizer_prosee/r
Of course, in this limited case, you could also combine everything into one regex:
sed -i 's/\(File\|MINvac\.pdb\)//g' /home/kanika/standard_minimizer_prosee/r
(Some sed dialects will want this without backslashes, and/or offer an option to use extended regular expressions, where they should be omitted. BSD sed, and thus also MacOS sed, demands a mandatory argument to sed -i which can however be empty, like sed -i ''.)
Use the -e flag:
sed -i -e 's/File//g' -e 's/MINvac.pdb//g' /home/kanika/standard_minimizer_prosee/r
Once you get more commands than are convenient to define with -es, it is better to store the commands in a separate file and include it with the -f flag.
In this case, you'd make a file containing:
s/File//g
s/MINvac.pdb//g
Let's call that file 'sedcommands'. You'd then use it with sed like this:
sed -i -f sedcommands /home/kanika/standard_minimizer_prosee/r
With only two commands, it's probably not worthwhile using a separate file of commands, but it is quite convenient if you have a lot of transformations to make.

How to use sed to remove ./ between two characters in Unix shell

I am trying to remove ./ between two characters using sed but not getting the desired output.
Sample:
e2b66a3d84ee448c33d7f2a2f7e51c58 ./2017_06_10_0400.txt
I tried the below but it is not working as expected, even the . in the ".txt" is getting removed.
sed -i 's/[./,]//g'
Beware: don't even think of using the -i option until you know the code is working. You can screw things up big time!
Use:
sed -e 's%[.]/%%g'
You can choose the delimiter in a s/// command, and when the regular expressions involve /, it is sensible to choose something else — I often use % when it doesn't figure in the text. The -e is optional. Using [.] to detect an actual dot is one way; you can write \. if you prefer, but I'm allergic to avoidable backslashes (if you've never had to write 16 backslashes in a row to get troff to do what you want, you haven't suffered enough).
Be aware that the -i option behaves differently in GNU sed and BSD (macOS) sed. Using -i.bak works in both (for an arbitrary, non-empty string such as .bak). Otherwise, your code isn't portable (which may or may not matter to you now, but might well do later on).
You have:
sed -i 's/[./,]//g'
The trouble with this is that it looks for any of the characters ., / or , in isolation — so it removes the . in .txt as well as the . and / in ./. You need to look for consecutive characters — as in my suggested solution.
try this:
echo "e2b66a3d84ee448c33d7f2a2f7e51c58 ./2017_06_10_0400.txt" | sed -e 's|\./||'
You need to use escape character \
's#\.\/##g'
:=>echo "e2b66a3d84ee448c33d7f2a2f7e51c58 ./2017_06_10_0400.txt" | sed 's#\.\/##g'
e2b66a3d84ee448c33d7f2a2f7e51c58 2017_06_10_0400.txt
:=>

Grep Search in Shell Script

Trying to implement a system which is based on shell script and PHP. Getting the string from PHP file and processing through shell scripts.
Every time it's working except some time where strings are like : "/jobs?location_country=united+states&sort_by=cfml10%2cdesc&v_location=usa"
grep for this command not working.
How to solve this?
Code is :
hcm=$(php largest.php "$file"_hcm_input.txt "$remove")
echo "$hcm"
grep "$hcm" "$file"_sorted.txt > "$file"_jobs.txt
Use grep -F or grep --fixed-strings to tell grep to treat the argument as a fixed string rather than a regex.
As you have & in the string, even if you enclose it in single/double quotes it won't work.
In this case you have to escape &, using \ like \&.
After getting lot hit and trial found out that this is not the issue with & = or + as i have checked individually
grep "=" s37_sorted.txt
grep "&" s37_sorted.txt
grep "+" s37_sorted.txt
Giving me output.
Exact reason was the case. So we need to find with case insensitive manner for that we have to follow the following code and parameter with grep is -i
hcm=$(php largest.php "$file"_hcm_input.txt "$remove")
echo "Highest Common String:"$hcm
grep -i "$hcm" "$file"_sorted.txt > "$file"_jobs.txt
Now it's showing me 366 records.

Does grep support the OR in a group?

I am looking at this question: https://leetcode.com/problems/valid-phone-numbers/
which asked using a cmd to extract the phone numbers.
I found this command works:
cat file.txt | grep -Eo '^(\([0-9]{3}\) ){1}[0-9]{3}-[0-9]{4}$|^([0-9]{3}-){2}[0-9]{4}$'
while this failed:
cat file.txt | grep -E '(^(\([0-9]{3}\))|^([0-9]{3}-))[0-9]{3}-[0-9]{4}'
I don't know why the second failed. Does it because grep doesn't support OR in a group?
No, it's because you dropped the space, so space in a phone number will no longer be allowed.
Also, the grouping in your regex seems to be off by a whack or two. What are you actually trying to express?
Finally, you have a useless use of cat -- grep can perfectly well read one or more input files without the help of cat.

Removing duplicate entries from files on the basis of substring postfixes

Let's say that I have the following text in a file:
foo.bar.baz
bar.baz
123.foo.bar.baz
pqr.abc.def
xyz.abc.def
abc.def.ghi.jkl
def.ghi.jkl
How would I remove duplicates from the file, on the basis of postfixes? The expected output without duplicates would be:
bar.baz
pqr.abc.def
xyz.abc.def
def.ghi.jkl
(Consider foo.bar.baz and bar.baz. The latter is a substring postfix so only bar.baz remains. However, neither of pqr.abc.def and xyz.abc.def are not substring postfixes of each other, so both remain.)
Try this:
#!/bin/bash
INPUT_FILE="$1"
in="$(cat $INPUT_FILE)"
out="$in"
for line in $in; do
out=$(echo "$out" | grep -v "\.$line\$")
done
echo "$out"
You need to save it to a script (e.g. bashor.sh), make it executable (chmod +x bashor.sh) and call it with your input file as the first argument:
./bashor.sh path/to/input.txt
Use sed to escape the string for regular expressions, prefix ., postfix $ and pipe this into GNU grep (-f - doesn't work with BSD grep, eg. on a mac).
sed 's/[^-A-Za-z0-9_]/\\&/g; s/^/./; s/$/$/' test.txt |grep -vf - test.txt
I just used to regular expression escaping from another answer and didn't think about whether it is reasonable. On first sight it seems fine, but escapes too much, though probably this is not an issue.

Resources