How to grep '*' in unix korn shell - shell

I'm trying to find something in a file with a pattern using the '*' but is not working, any idea how to do this?
this is what I'm trying to do:
grep "files*.txt" $myTestFile
is not returning anything, it's suppose that "*" should be all.

By default, grep doesn't support extended regular express, but grep -E or egrep do.
egrep "files.*\.txt" $myTestFile
or
grep -E "files.*\.txt" $myTestFile
In addition, three variant programs egrep, fgrep and rgrep are available. egrep is the same as grep -E. fgrep is the same as grep -F. rgrep is the same as grep -r. Direct
invocation as either egrep or fgrep is deprecated, but is provided to allow historical applications that rely on them to run unmodified.
Matcher Selection
-E, --extended-regexp
Interpret PATTERN as an extended regular expression (ERE, see below). (-E is specified by POSIX.)
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched. (-F is specified by POSIX.)
-G, --basic-regexp
Interpret PATTERN as a basic regular expression (BRE, see below). This is the default.
-P, --perl-regexp
Interpret PATTERN as a Perl regular expression (PCRE, see below). This is highly experimental and `grep -P` may warn of unimplemented features.

If you only want to match the exact string files*.txt, that would be:
# match exactly "files*.txt"
grep -e "files[*][.]txt" "$myTestFile"
...or, more simply put using fgrep to match only the exact string given:
# match exactly "files*.txt"
fgrep -e 'files*.txt' "$myTestFile"
[*] defines a character class with only a single character -- the * -- contained, and thus matches only that one character. Backslash-based escaping is also possible, but can have different meanings in different contexts and thus is less reliable.
If you want to match any line that contains files, and later .txt, then:
# match any line containing "files" and later ".txt"
grep -e "files.*[.]txt" "$myTestFile"
.* matches zero-or-more characters, and is thus the regex equivalent to the glob-pattern *. Likewise, whereas in a glob pattern . matches only itself, in a regex . can match any character, so the . in .txt needs to be escaped, as in [.]txt, to prevent it from matching anything else.

Related

Why does the second grep command not work?

I have a folder named "components" and within that folder a file name "apple"
If I cd to "components" folder and execute the following command:
ls | grep -G a*e
It works and returns apple correctly.
However, if I do not cd to components folder and execute the following command:
ls components | grep -G a*e
It does not work and returns blank. What could be the reason?
A third grep command below works fine.
ls components | grep ap
The actual filename I am grepping is complex. So I need the grep -G tag to work.
Unquoted, a*e is a shell glob pattern that is expanded by the shell before grep runs.
When you are in the directory, this:
ls | grep -G a*e
becomes
ls | grep -G apple
As you have a file named 'apple' this matches.
When you are not in the folder, and you run:
ls components | grep -G a*e
the shell again attempts to expand the glob pattern.
If there is any file in your current directory that matches (for example, "abalone"), then the glob will expand to that. It may expand to multiple strings if there is more than one such filename (for example, "abalone", "algae"). The command becomes something like:
ls components | grep -G abalone
ls components | grep -G abalone algae
In the first case, you will get blank output unless components directory also contains that filename.
In the second case, grep will ignore the directory entirely and attempt to find the string "abalone" inside the file "algae".
There is a third possibility: the glob fails to find anything. In this case, grep will receive the regexp a*e. The -G option to grep tell it to use BRE-style regexp. With these, a*e means "zero or more a followed by e". This is equivalent to saying "contains e".
In that case, you should see apple in your results regardless of whether you are in components or not. In a comment, you say that ls components | grep "a*e" returned nothing. As quoting should force precisely the same result as this third case, this is surprising.
Note that if you are intending to use globs you don't need grep at all:
cd components
ls a*e
ls components/a*e
a*e is a glob, not a regex. It's important to understand the difference.
The shell expands globs in unquoted arguments by matching the argument with available files. The * in a*e means "any sequence of characters not containing a directory separator", so it will match the filename apple (or accolade.node) as long as that file is present in the current directory. Glob matches are complete, not substring matches.
So when you execute grep a*e in a directory which contains the file apple, the shell will replace a*e with the word apple before invoking grep, making the command grep apple. If the directory also contained the file accolade.node, the shell would have put that into the command line as well; grep accolade.node apple. That's very rarely what you want to happen to grep arguments (other than filename arguments), so it's highly recommended to get into the habit of quoting arguments.
Unlike the shell, grep is based on regular expression matching. In a regular expression, * means "any number of repetitions of the previous element", so the regular expression a*e will match e, ae, aae, aaae, and so on. Since grep does substring matching (by default), those strings could be anywhere in the line being matched. That will match the e in apple, for example, but it will also match any other line which contains an e, such as electronics. (That makes it a bit surprising that ls components | grep "a*e" did not match components/apple. Perhaps there was some typing problem.)
In order to match a followed by a sequence of arbitrary characters followed by an e, you could use the regular expression a.*e (i.e. grep "a.*e" -- note the use of quotes to avoid having the shell try to expand that argument as a glob). But that will probably match too much, if you're expecting it to do the same thing as the glob a*e. You might want to add some restrictions. For example, grep -w forces the match to be complete words. And (with gnu grep, at least) you can use grep -w "a\S*e" to match a complete word which starts with a and ends with e, using the \S shortcut (any character other than whitespace).
You very rarely want to use -G, by the way, particularly since it's the default (unfortunately). Most of the time, you'll want to use grep -E in order to not have to insert backslashes throughout your pattern. Please read man 7 regex for a quick overview of regex syntax and the difference between basic and extended Posix regexes. man grep is also useful, of course.

Using regex in grep filename

I want to search a certain string in a number of archival log folders which reflect different servers. I use 2 different commands as of now
-bash-4.1$ zcat /mnt/bkp/logs/cmmt-54-22[8-9]/my_app.2021-12-28-* | grep 'abc'
and
-bash-4.1$ zcat /mnt/bkp/logs/cmmt-54-23[0-3]/my_app.2021-12-28-* | grep 'abc'
I basically want to search on server folders cmmt-54-228, cmmt-54-229 .... cmmt-54-233.
I tried combining the two commands into one but it doesn't seem to be working some mistake in using regex from my side
-bash-4.1$ zcat /mnt/bkp/logs/cmmt-54-22[8-9]|3[0-3]/my_app.2021-12-28-* | grep 'abc'
Please help.
Regex is not glob. See man 7 glob vs man 7 regex.
grep with with regex. grep filters lines that match some regular expresion.
Shell expands words that you write. Shell replaces what you write that contains "filename expansion triggers" * ? [ and replaces that word with a list of words of matching filenames.
You can use extended pattern matching (see man bash), which sounds like the most natural here:
shopt -s extglob
echo /mnt/bkp/logs/cmmt-54-2#(2[8-9]|3[0-3])/my_app.2021-12-28-*
In interactive shell I would just write it twice:
zcat /mnt/bkp/logs/cmmt-54-22[8-9]/my_app.2021-12-28-* /mnt/bkp/logs/cmmt-54-23[0-3]/my_app.2021-12-28-*
Or with brace expansion (see man bash):
zcat /mnt/bkp/logs/cmmt-54-2{2[8-9],3[0-3]}/my_app.2021-12-28-*
Braces expansion first replaces the word by two words, then filename expansion replaces them for actual filenames.
You can also find files with a -regex. For that, see man find. (Or output a list of filenames and pipe it to grep and then use xargs or similar to pass it to a command)

Grepping for exact string while ignoring regex for dot character

So here's my issue. I need to develop a small bash script that can grep a file containing account names (let's call it file.txt). The contents would be something like this:
accounttest
account2
account
accountbtest
account.test
Matching an exact line SHOULD be easy but apparently it's really not.
I tried:
grep "^account$" file.txt
The output is:
account
So in this situation the output is OK, only "account" is displayed.
But if I try:
grep "^account.test$" file.txt
The output is:
accountbtest
account.test
So the next obvious solution that comes to mind, in order to stop interpreting the dot character as "any character", is using fgrep, right?
fgrep account.test file.txt
The output, as expected, is correct this time:
account.test
But what if I try now:
fgrep account file.txt
Output:
accounttest
account2
account
accountbtest
account.test
This time the output is completely wrong, because I can't use the beginning/end line characters with fgrep.
So my question is, how can I properly grep a whole line, including the beginning and end of line special characters, while also matching exactly the "." character?
EDIT: Please note that I do know that the "." character needs to be escaped, but in my situation, escaping is not an option, because of further processing that needs to be done to the account name, which would make things too complicated.
The . is a special character in regex notation which needs to be escaped to match it as a literal string when passing to grep, so do
grep "^account\.test$" file.txt
Or if you cannot afford to modify the search string use the -F flag in grep to treat it as literal string and not do any extra processing in it
grep -Fx 'account.test' file.txt
From man grep
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings (instead of regular expressions), separated by newlines, any of which is to be matched.
-x, --line-regexp
Select only those matches that exactly match the whole line. For a regular expression pattern, this is like parenthesizing the pattern and then surrounding it with ^ and $.
fgrep is the same as grep -F. grep also has the -x option which matches against whole lines only. You can combine these to get what you want:
grep -Fx account.test file.txt

recursively replace text with sed

I want to use sed to replace each occurence of a particular text in a full source file tree. I've attempted the following:
$ grep -rlI name2port\(\"Wan1\"\) . --exclude-dir=.svn --exclude=*.vxs | xargs sed -i 's/name2port\(\"Wan1\"\)/T_PORT_ID_WAN1/g' but it doesn't seem to work, I think my sed cmd isn't correct. How do I do this?
The problem is, that the replacements just do not happen.
I tried this: $ sed -i 's/name2port\(\"Wan1\"\)/T_PORT_ID_WAN1/g' ./rtos_core/jpax_switch/api/src/nms/switch_l3_route.c but turns out, the occurences of name2port("Wan1") would not be replaced.
sed uses BREs (basic regular expressions) by default, which, for historical reasons - and surprisingly for someone used to modern regular expressions - require escaping of certain metacharacters in order to be recognized as such.
In BREs, ( and ) are ordinary (literal) characters, and only become special when \-escaped.
Therefore, to match literal name2port("Wan1"), use that literal as-is in a BRE (given that you also don't need to \-escape " instances):
sed -i 's/name2port("Wan1")/T_PORT_ID_WAN1/g' ./rtos_core/jpax_switch/api/src/nms/switch_l3_route.c
If you're not concerned about portability, you can use -r (or -E for limited portability to macOS, though with -i that won't work), which then enables EREs (extended regular expressions), whose syntax and features are more likely to work as you expect:
sed -r -i 's/name2port\("Wan1"\)/T_PORT_ID_WAN1/g' ./rtos_core/jpax_switch/api/src/nms/switch_l3_route.c
Note how literals ( and ) now do need to be \-escaped, lest they be interpreted as enclosing a capture group.
In this particular case, it is the BRE that requires less escaping than the ERE; generally, though, it is the opposite.

Remove everything but one file tcsh

I basically want to do the following bash command but in tcsh:
rm !(file1)
Thanks
You can use ls -1 (that's the number one, not the lowercase letter L) to list one file per line, and then use grep -vx <pattern> to exclude (-v) lines that exactly (-x) match <pattern>, and then xargs it to your command, rm. For example,
ls -1 | grep -vx file1 | xargs rm
In case your version of grep doesn't support the -x option, you can use anchors:
ls -1 | grep -vx '^file1$' | xargs rm
To use this with commands other than rm that may not take an arbitrary number of arguments, remember to add the -n 1 option to xargs so that arguments are handled one by one:
ls -1 | grep -vx '^file1$' | xargs -n 1 rm
I believe you can also achieve this using find's -name option to specify a parameter by negation, i.e. the find utility itself may support expressions like !(file1), though you'll still have to pipe the results to xargs.
tcsh has a special ^ syntax for glob patterns (not supported in csh, sh, or bash). Prefixing a glob pattern with ^ negates it, causing to match all file names that don't match the pattern.
Quoting the tcsh manual:
An entire glob-pattern can also be negated with `^':
> echo *
bang crash crunch ouch
> echo ^cr*
bang ouch
A single file name is not a glob pattern, and so the ^ prefix doesn't apply to it, but it can be turned into one by, for example, surrounding the first character with square brackets.
So this:
rm ^[f]ile1
should remove all files in the current directory other than file1.
I strongly recommend testing this before using it, either by using an echo command first:
echo ^[f]ile1
or by using Ctrl-X * to expand the pattern to a list of files before hitting Enter.
UPDATE: I've since learned that bash supports similar functionality but with a different syntax. In bash, !(PATTERN) matches anything not matched by the pattern. This is not recognized unless the extglob shell option is enabled. Unlike tcsh's ^ syntax, the pattern can be a single file name. This isn't relevant to what you're asking, but it could be useful if you ever decide to switch to bash.
zsh probably has something similar.

Resources