I have some trouble with "grep"and collating symbols - shell

Here is my problem.A existed file named data.f,I use collating symbol "48",I want to match "48"in my file with Collating symbols in bracket expressions.
grep '[[.48.]]' data.f
but there is some error tip:
grep: Invalid collation character
but, there is no problem with character classes in bracket expressions.
grep "[[:alpha:]]" data.f

if you want to grep 48
grep 48 file
if you want to grep "48"
grep '"48"' file
// to avoid discussion in comments I extend my post with more examples
if you want to grep n occurrences of "48" in one line you should use regular expressions
cat file | grep '\(.*"48"\)\{n\}' | grep -v '\(.*"48"\)\{n+1\}'
basically you grep lines with at least n occurrences, and then with invert-match you exclude lines with n+1 occurrences of string, so you get n occurrences
in you comment you mentioned you wanted to grep lines with 5 occurrences of "48", that CAN be separated by other characters (that's the reason I put .* before "48")
so here is the sample
cat file | grep '\(.*"48"\)\{5\}' | grep -v '\(.*"48"\)\{6\}'

Wouldn't grep '48' data.f work?

I have no idea what you mean by “I use collating symbol "48"” (I know what collation classes are, which is what grep expects to see in your input, but I don't know what a collation symbol would be), but from one of your comments, it seems you're actually looking for the exact string [[.48.]] in your file. Here's two ways of doing just that:
grep -F '[[.48.]]' data.f
grep '\[\[.48.]]' data.f
In one of your other comments, you asked for how to ask grep for lines with at least five occurrences of “48” on them. That's a pretty clear regex question:
grep -E '(.*48){5}' data.f

Related

How to count the number of files each pattern of a group appears in a file?

I am having problems when trying to count the number of times a specific pattern appears in a file (let's call it B). In this case, I have a file with 30 patterns (let's call it A), and I want to know how many lines contain that pattern.
With only one pattern it is quite simple:
grep "pattern" file | wc -l
But with a file full of them I am not able to figure out how it may work. I already tried this:
grep -f "fileA" "fileB" | wc -l
Nevertheless, it gives me the total times all patterns appear, not each one of them (that's what I desire to get).
Thank you so much.
Count matches per literal string
If you simply want to know how often each pattern appears and each of your pattern is a fixed string (not a regex), use ...
grep -oFf needles.txt haystack.txt | sort | uniq -c
Count matching lines per literal string
Note that above is slightly different from your formulation " I want to know how many lines contain that pattern" as one line can have multiple matches. If you really have to count matching lines per pattern instead of matches per pattern, then things get a little bit trickier:
grep -noFf needles.txt haystack.txt | sort | uniq | cut -d: -f2- | uniq -c
Count matching lines per regex
If the patterns are regexes, you probably have to iterate over the patterns, as grep's output only tells you that (at least) one pattern matched, but not which one.
# this will be very slow if you have many patterns
while IFS= read -r pattern; do
printf '%8d %s\n' "$(grep -ce "$pattern" haystack.txt)" "$pattern"
done < needles.txt
... or use a different tool/language like awk or perl.
Note on overlapping matches
You did not formulate any precise requirements, so I went with the simplest solutions for each case. The first two solutions and the last solution behave differently in case multiple patterns match (part of) the same substring.
grep -f needles.txt matches each substring at most once. Therefore some matches might be "missed" (interpretation of "missed" depends on your requirements)
whereas iterating grep -e pattern1; grep -e pattern2; ... might match the same substring multiple times.

Grepping for exact string while ignoring regex for dot character

So here's my issue. I need to develop a small bash script that can grep a file containing account names (let's call it file.txt). The contents would be something like this:
accounttest
account2
account
accountbtest
account.test
Matching an exact line SHOULD be easy but apparently it's really not.
I tried:
grep "^account$" file.txt
The output is:
account
So in this situation the output is OK, only "account" is displayed.
But if I try:
grep "^account.test$" file.txt
The output is:
accountbtest
account.test
So the next obvious solution that comes to mind, in order to stop interpreting the dot character as "any character", is using fgrep, right?
fgrep account.test file.txt
The output, as expected, is correct this time:
account.test
But what if I try now:
fgrep account file.txt
Output:
accounttest
account2
account
accountbtest
account.test
This time the output is completely wrong, because I can't use the beginning/end line characters with fgrep.
So my question is, how can I properly grep a whole line, including the beginning and end of line special characters, while also matching exactly the "." character?
EDIT: Please note that I do know that the "." character needs to be escaped, but in my situation, escaping is not an option, because of further processing that needs to be done to the account name, which would make things too complicated.
The . is a special character in regex notation which needs to be escaped to match it as a literal string when passing to grep, so do
grep "^account\.test$" file.txt
Or if you cannot afford to modify the search string use the -F flag in grep to treat it as literal string and not do any extra processing in it
grep -Fx 'account.test' file.txt
From man grep
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings (instead of regular expressions), separated by newlines, any of which is to be matched.
-x, --line-regexp
Select only those matches that exactly match the whole line. For a regular expression pattern, this is like parenthesizing the pattern and then surrounding it with ^ and $.
fgrep is the same as grep -F. grep also has the -x option which matches against whole lines only. You can combine these to get what you want:
grep -Fx account.test file.txt

grep pipe searching for one word, not line

For some reason I cannot get this to output just the version of this line. I suspect it has something to do with how grep interprets the dash.
This command:
admin#DEV:~/TEMP$ sendemail
Yields the following:
sendemail-1.56 by Brandon Zehm
More output below omitted
The first line is of interest. I'm trying to store the version to variable.
TESTVAR=$(sendemail | grep '\s1.56\s')
Does anyone see what I am doing wrong? Thanks
TESTVAR is just empty. Even without TESTVAR, the output is empty.
I just tried the following too, thinking this might work.
sendemail | grep '\<1.56\>'
I just tried it again, while editing and I think I have another issue. Perhaps im not handling the output correctly. Its outputting the entire line, but I can see that grep is finding 1.56 because it highlights it in the line.
$ TESTVAR=$(echo 'sendemail-1.56 by Brandon Zehm' | grep -Eo '1.56')
$ echo $TESTVAR
1.56
The point is grep -Eo '1.56'
from grep man page:
-E, --extended-regexp
Interpret PATTERN as an extended regular expression (ERE, see below). (-E is specified by POSIX.)
-o, --only-matching
Print only the matched (non-empty) parts of a matching line, with each such part on a separate output
line.
Your regular expression doesn't match the form of the version. You have specified that the version is surrounded by spaces, yet in front of it you have a dash.
Replace the first \s with the capitalized form \S, or explicit set of characters and it should work.
I'm wondering: In your example you seem to know the version (since you grep for it), so you could just assign the version string to the variable. I assume that you want to obtain any (unknown) version string there. The regular expression for this in sed could be (using POSIX character classes):
sendemail |sed -n -r '1 s/sendemail-([[:digit:]]+\.[[:digit:]]+).*/\1/ p'
The -n suppresses the normal default output of every line; -r enables extended regular expressions; the leading 1 tells sed to only work on line 1 (I assume the version appears in the first line). I anchored the version number to the telltale string sendemail- so that potential other numbers elsewhere in that line are not matched. If the program name changes or the hyphen goes away in future versions, this wouldn't match any longer though.
Both the grep solution above and this one have the disadvantage to read the whole output which (as emails go these days) may be long. In addition, grep would find all other lines in the program's output which contain the pattern (if it's indeed emails, somebody might discuss this problem in them, with examples!). If it's indeed the first line, piping through head -1 first would be efficient and prudent.
jayadevan#jayadevan-Vostro-2520:~$ echo $sendmail
sendemail-1.56 by Brandon Zehm
jayadevan#jayadevan-Vostro-2520:~$ echo $sendmail | cut -f2 -d "-" | cut -f1 -d" "
1.56

How to parse a config file using sed

I've never used sed apart from the few hours trying to solve this. I have a config file with parameters like:
test.us.param=value
test.eu.param=value
prod.us.param=value
prod.eu.param=value
I need to parse these and output this if REGIONID is US:
test.param=value
prod.param=value
Any help on how to do this (with sed or otherwise) would be great.
This works for me:
sed -n 's/\.us\././p'
i.e. if the ".us." can be replaced by a dot, print the result.
If there are hundreds and hundreds of lines it might be more efficient to first search for lines containing .us. and then do the string replacement... AWK is another good choice or pipe grep into sed
cat INPUT_FILE | grep "\.us\." | sed 's/\.us\./\./g'
Of course if '.us.' can be in the value this isn't sufficient.
You could also do with with the address syntax (technically you can embed the second sed into the first statement as well just can't remember syntax)
sed -n '/\(prod\|test\).us.[^=]*=/p' FILE | sed 's/\.us\./\./g'
We should probably do something cleaner. If the format is always environment.region.param we could look at forcing this only to occur on the text PRIOR to the equal sign.
sed -n 's/^\([^,]*\)\.us\.\([^=]\)=/\1.\2=/g'
This will only work on lines starting with any number of chars followed by '.' then 'us', then '.' and then anynumber prior to '=' sign. This way we won't potentially modify '.us.' if found within a "value"

How to grep this kind of pattern out using this "grep" command?

In file named appleFile:
1.apple_with_seeds###
2.apple_with_seeds###
3.apple_with_seeds_and_skins###
4.apple_with_seeds_and_skins###
5.apple_with_seeds_and_skins###
.....
.....
.....
How can i use the grep command to grep the pattern only with "apple_with_seeds"???
It is supposed that there is random characters after seeds and skins.
Result:
1.apple_with_seeds###
2.apple_with_seeds###
Maybe something like this will work for you:
grep 'apple_with_seeds[^_]' appleFile
That will print all lines having no _ character after seeds. You can add other characters to exclude to between the brackets (but after the ^), e.g. [^_a-z] will additionally exclude all lower case letters.
Or you could explicitly include some characters (like #):
grep 'apple_with_seeds[#]*$' appleFile
And again you can add arbitrary characters between the brackets, e.g. [#A-Z] would match any of the characters # or A-Z.
cat appleFile | grep "apple_with_seeds$"
UPDATE:
if you want to exclude something, try -v option:
cat appleFile | grep "apple_with_seeds$" | grep -v "exclude_pattern"
Try this
cat appleFile|grep -i seeds$

Resources