How to grep date in mm/dd/yy where dd can be any in bash? - bash

I'm trying the following and unable to get the desired results:
cat sfd_log.txt | grep '12/*/17' | more
My objective is to get all December 2017 entries from the file. Thank you.

grep uses (optionally extended) regular expression syntax. Bash uses globbing. Similar enough to be confusing, but different enough to be important:
grep '12/.*/17' sfd_log.txt | more
Or, more precise if you like:
grep -E '12/[0-9][0-9]?/17' sfd_log.txt | more
What you actually asked for with "12/*/17" is:
a literal '12' followed by 0 or more literal '/' followed by a literal '/17'".

Related

How to capture digits in front of specific keyword in bash

Imagine following string:
<tr><td>12,3</td><td>deg</td><td>23,4</td><td>humi</td><td>34,5</td><td>press</td></tr>
In bash, how do I extract 23.4, based on the condition that it is followed by humi?
grep -o works well for this sort of thing. I'm sure performance would be better with a single sed command than two greps but that's rarely a serious concern.
X='<tr><td>12,3</td><td>deg</td><td>23,4</td><td>humi</td><td>34,5</td><td>press</td></tr>'
echo $X | grep -o '[0-9,.]*</td><td>humi' | grep -o '[0-9,.]*'
# Result: 23,4
You can additionally pipe through tr , . to get English number format.

Regex - Pattern Matching in Shell

I am trying to match a pattern and extract the values that comes after it. I have used below regex pattern matchching, it it dint help me. No values got extracted as I got blank value when I echoed it.
Someone let me know what mistake I made.
Sample regex:
class="remove_link_style">Site Issue - Please check</a></td><td>
Working</td><td>
<ahref="/0051043899"class="remove_link_style">
patten used: text=$(echo "class="remove_link_style">Site Issue - Please check</a></td><td>Working</td><td><ahref="/0051043899"class="remove_link_style">" | grep -o --perl-regexp "(?class="remove_link_style")[a-zA-Z0-9_]+"")
I also wanted to extract the string that comes after class="remove_link_style" but before </a></td><td>
I think you would find a lot of references and advice not to parse XML with bash tools like grep/sed/awk . With this context, I would advise using any of the parsing tools like http://xmlsoft.org/xmllint.html or http://xmlstar.sourceforge.net/doc/xmlstarlet.txt . But if you'd like to quickly extract the contents, you can combine grep and cut as below.
echo 'class="remove_link_style">GB|Trekkinn-UK|Manualcrawlrequest|1</a></td><td>WorkInProgress</td><td><ahref="/0051043899"class="remove_link_style">' | grep -Eo 'style"[^<>]*>[^<>]+' | cut -f2 -d">"
This prints out:
GB|Trekkinn-UK|Manualcrawlrequest|1
WorkInProgress
EDIT : As per OP's ask, store the output into an array.
If you need the output to be stored in an array, you need to set the IFS since you have white spaces in your elements.
IFS=$'\n'
result=($(echo 'class="remove_link_style">Site Issue - Please check</a></td><td>Working</td><td><ahref="/0051043899"class="remove_link_style">' | grep -Eo 'style"[^<>]*>[^<>]+' | cut -f2 -d">"))
unset IFS
for i in "${result[#]}"; do echo $i; done
Site Issue - Please check
Working

Getting max version by file name

I need to write a shell script that does the following:
In a given folder with files that fit the pattern: update-8.1.0-v46.sql I need to find the maximum version
I need to write the maximum version I've found into a configuration file
For 1, I've found the following answer: Shell script: find maximum value in a sequence of integers without sorting
The only problem I have is that I can't get down to a list of only the versions,
I tried:
ls | grep -o "update-8.1.0-v\(\d*\).sql"
but I get the entire file name in return and not just the matching part
Any ideas?
Maybe move everything to awk?
I ended up using:
SCHEMA=`ls database/targets/oracle/ | grep -o "update-$VERSION-v.*.sql" | sed "s/update-$VERSION-v\([0-9]*\).sql/\1/p" | awk '$0>x{x=$0};END{print x}'`
based on dreamer's answer
you can use sed for this:
echo "update-8.1.0-v46.sql" | sed 's/update-8.1.0-v\([0-9]*\).sql/\1/p'
The output in this case will be 46
grep isn't really the best tool for extracting captured matches, but you can use look-behind assertions if you switch it to use perl-like regular expressions. Anything in the assertion will not be printed when using the -o flag.
ls | grep -Po "(?<=update-8.1.0-v)\d+"
46

grep pipe searching for one word, not line

For some reason I cannot get this to output just the version of this line. I suspect it has something to do with how grep interprets the dash.
This command:
admin#DEV:~/TEMP$ sendemail
Yields the following:
sendemail-1.56 by Brandon Zehm
More output below omitted
The first line is of interest. I'm trying to store the version to variable.
TESTVAR=$(sendemail | grep '\s1.56\s')
Does anyone see what I am doing wrong? Thanks
TESTVAR is just empty. Even without TESTVAR, the output is empty.
I just tried the following too, thinking this might work.
sendemail | grep '\<1.56\>'
I just tried it again, while editing and I think I have another issue. Perhaps im not handling the output correctly. Its outputting the entire line, but I can see that grep is finding 1.56 because it highlights it in the line.
$ TESTVAR=$(echo 'sendemail-1.56 by Brandon Zehm' | grep -Eo '1.56')
$ echo $TESTVAR
1.56
The point is grep -Eo '1.56'
from grep man page:
-E, --extended-regexp
Interpret PATTERN as an extended regular expression (ERE, see below). (-E is specified by POSIX.)
-o, --only-matching
Print only the matched (non-empty) parts of a matching line, with each such part on a separate output
line.
Your regular expression doesn't match the form of the version. You have specified that the version is surrounded by spaces, yet in front of it you have a dash.
Replace the first \s with the capitalized form \S, or explicit set of characters and it should work.
I'm wondering: In your example you seem to know the version (since you grep for it), so you could just assign the version string to the variable. I assume that you want to obtain any (unknown) version string there. The regular expression for this in sed could be (using POSIX character classes):
sendemail |sed -n -r '1 s/sendemail-([[:digit:]]+\.[[:digit:]]+).*/\1/ p'
The -n suppresses the normal default output of every line; -r enables extended regular expressions; the leading 1 tells sed to only work on line 1 (I assume the version appears in the first line). I anchored the version number to the telltale string sendemail- so that potential other numbers elsewhere in that line are not matched. If the program name changes or the hyphen goes away in future versions, this wouldn't match any longer though.
Both the grep solution above and this one have the disadvantage to read the whole output which (as emails go these days) may be long. In addition, grep would find all other lines in the program's output which contain the pattern (if it's indeed emails, somebody might discuss this problem in them, with examples!). If it's indeed the first line, piping through head -1 first would be efficient and prudent.
jayadevan#jayadevan-Vostro-2520:~$ echo $sendmail
sendemail-1.56 by Brandon Zehm
jayadevan#jayadevan-Vostro-2520:~$ echo $sendmail | cut -f2 -d "-" | cut -f1 -d" "
1.56

I have some trouble with "grep"and collating symbols

Here is my problem.A existed file named data.f,I use collating symbol "48",I want to match "48"in my file with Collating symbols in bracket expressions.
grep '[[.48.]]' data.f
but there is some error tip:
grep: Invalid collation character
but, there is no problem with character classes in bracket expressions.
grep "[[:alpha:]]" data.f
if you want to grep 48
grep 48 file
if you want to grep "48"
grep '"48"' file
// to avoid discussion in comments I extend my post with more examples
if you want to grep n occurrences of "48" in one line you should use regular expressions
cat file | grep '\(.*"48"\)\{n\}' | grep -v '\(.*"48"\)\{n+1\}'
basically you grep lines with at least n occurrences, and then with invert-match you exclude lines with n+1 occurrences of string, so you get n occurrences
in you comment you mentioned you wanted to grep lines with 5 occurrences of "48", that CAN be separated by other characters (that's the reason I put .* before "48")
so here is the sample
cat file | grep '\(.*"48"\)\{5\}' | grep -v '\(.*"48"\)\{6\}'
Wouldn't grep '48' data.f work?
I have no idea what you mean by “I use collating symbol "48"” (I know what collation classes are, which is what grep expects to see in your input, but I don't know what a collation symbol would be), but from one of your comments, it seems you're actually looking for the exact string [[.48.]] in your file. Here's two ways of doing just that:
grep -F '[[.48.]]' data.f
grep '\[\[.48.]]' data.f
In one of your other comments, you asked for how to ask grep for lines with at least five occurrences of “48” on them. That's a pretty clear regex question:
grep -E '(.*48){5}' data.f

Resources