Is there an elegant regex to grep a MAC address accounting for various delimiters? - shell

I will occasionally search aggregate logs files on my syslog server for a specific MAC address. Since each source uses a different format for MAC addresses, I usually use this command:
less syslog.log | grep -i -E '56[:-\.]?ea[:-\.]?b6[:-\.]?a6[:-\.]?82[:-\.]?5e'
Which will find the address regardless of the format or case (56eab6a6825e, 56ea.b6a6.825e, 56:ea:b6:a6:82:5e, 56-EA-B6-A6-82-5E).
I have this command saved in text file so I can just replace each hex pair with the relevant digits and paste it in, but is there an elegant way format my regex where I can have the whole address together? For example:
less syslog.log | grep -i -E '56eab6a6825e[:-\.]?(anywhereinthestring)'
I basically want to be more lazy when searching, but I don't understand lookarounds enough to know if they are applicable in this case. Is this even possible?

Simply store the mac-address to search in a variable and use Bash's replace expanded value to generate the Regex for grep:
mac='56:ea:b6:a6:82:5e'
# Compose a regex on-the fly by replacing all colons with [:.-]?
grep -iE "${mac//:/[:.-]?}"
Or same as a function:
grepmac() {
# Usage:
# grepmac MAC_ADDRESS FILE [FILE]...
# Parses input argument 1 as mac-address
# regardless if it uses delimiters or not.
# Returns failure if input argument 1 is not a mac-address.
[[ $1 =~ ([[:xdigit:]]{2})[:.-]?([[:xdigit:]]{2})[:.-]?([[:xdigit:]]{2})[:.-]?([[:xdigit:]]{2})[:.-]?([[:xdigit:]]{2})[:.-]?([[:xdigit:]]{2}) ]] || return 1
# Sets delimiter locally to : to join matches with colon
local -- IFS=:
# Joins matches except first to get a colon-delimited mac-address
mac="${BASH_REMATCH[*]:1}"
# Shifts out first argument away to only keep remaining file paths
shift
# Composes a regex by replacing colons in mac-address by [:.-]?
# which matches optional delimiter with : . or -
regex=${mac//:/[:.-]?}
# Performs the actual search
grep -iE "$regex" "$#"
}
Traced execution of grepmac:
$ set -x; grepmac 56eab6a6825e
+ grepmac 56eab6a6825e
+ [[ 56eab6a6825e =~ ([[:xdigit:]]{2})[:.-]?([[:xdigit:]]{2})[:.-]?([[:xdigit:]]{2})[:.-]?([[:xdigit:]]{2})[:.-]?([[:xdigit:]]{2})[:.-]?([[:xdigit:]]{2}) ]]
+ IFS=:
+ mac=56:ea:b6:a6:82:5e
+ shift
+ regex='56[:.-]?ea[:.-]?b6[:.-]?a6[:.-]?82[:.-]?5e'
+ grep --color=auto -iE '56[:.-]?ea[:.-]?b6[:.-]?a6[:.-]?82[:.-]?5e'

Here's a function that takes a MAC address—uppercase or lowercase, with or without punctuation—and constructs a regex from it. You can run it just like grep, with either a list of files or nothing to read from stdin.
grep-mac() {
local mac="$1"
local files=("${#:2}")
# Strip punctuation from the input MAC.
mac="${mac//[^[:alnum:]]}"
# Create a regex by inserting `[:-\.]?` in between every two characters.
local regex="${mac:0:2}$(sed -E 's/../[:-\\.]?\0/g' <<< "${mac:2}")"
# Call `grep` with the regex and files we were passed.
grep -iE "$regex" "${files[#]}"
}
Example usage:
❯ grep-mac 56:ea:b6:a6:82:5e syslog.log | less
❯ grep-mac 56EAB6A6825E syslog.log | less
You can put it in your ~/.bashrc if you want easy access.

You can try this grep
$ grep -Ei '56[[:alnum:]:.-]+5e' <(less syslog.log)

Related

Bash command - how to grep and then truncate but keep grep-ed part?

I am trying to splice out a particular piece of string. I used:
myVar=$(grep --color 'GACCT[ATCG]*AGGTC' FILE.txt | cat)
then, I used the code below to remove everything before and after my desired portion.
myVar1=$(echo ${myVar##*GACCT})
echo ${myVar1%%AGGTC*}
The code is working however, it cuts off the GACCT and AGGTC at the beginning and end of the desired fragmen that I want to keep. Is there anyway to cut the beginning and end off while still keeping the GACCT and AGGTC?
Thank you!
If you have a GNU grep, you can make use of
myVar=$(grep --color=never -oP 'GACCT\K[ATCG]+(?=AGGTC)' FILE.txt)
See the online demo:
#!/bin/bash
s='GACCTAAATTTGGGCCCAGGTC'
# Original script
myVar=$(grep --color 'GACCT[ATCG]*AGGTC' <<< "$s" | cat)
myVar1=$(echo ${myVar##*GACCT})
echo ${myVar1%%AGGTC*}
# => AAATTTGGGCCC
# My suggestion:
grep --color=never -oP 'GACCT\K[ATCG]+(?=AGGTC)' <<< "$s"
# => AAATTTGGGCCC
With --color=never, your matches are not colored.
The -o option outputs the matched texts, and the P option enables the PCRE regex engine. It is necessary here since the regex pattern contains specific operators, like \K and (?=...).
More details
GACCT - a literal string
\K - operator that makes the regex engine "forget" what has been consumed
[ATCG]+ - one or more letters from the set
(?=AGGTC) - a positive lookahead that requires an AGGTC string immediately to the right of the current location.
Note you can get this result with pcregrep, too, if you install it:
myVar=$(pcregrep -o 'GACCT\K[ATCG]+(?=AGGTC)' FILE.txt)

Find string then from there pull numbers

Im starting to code bash and not the best but i have a situation. I have an output like:
Configuration file 'hello2.conf' is in use by process 735.
Ending
I want to extract the process ID 735.
I seen answers were to extract ONLY numbers from outputs but then i am left with 2735?
How can i go about extracting 735 from the output? I was thinking search for process then grab number after perhaps?
Thanks!
Use GNU grep with its Perl Compatible Regular Expression capabilities enabled with the -P flag and print only the matching entry using -o flag.
grep -Po 'process \K[0-9]+' <<<"Configuration file 'hello2.conf' is in use by process 735."
735
Use it in a command line as
.. | grep -Po 'process \K[0-9]+'
where the \K escape sequence stands for
\K: This sequence resets the starting point of the reported match. Any previously matched characters are not included in the final matched sequence.
RegEx Demo
You might want to use a regular expressions:
[[ "$line" =~ ([0-9]+)\.$ ]] && echo "${BASH_REMATCH[1]}"
This should match any number at the end of the line, select the number part, and print it!
Good Luck!
If you line remains the same, use cut -d" " -f 9
sed can extract only the numbers at the specific location of the message (using \(...\) match grouping and \1 replacement).
... | sed "s#^Configuration file '.*' is in use by process \([0-9]*\)\.#\1#"

how to extract word from grep result in shell?

Using shell i want to search and print only sub-string with next word to that sub-string.
e.g. logfile has line "today is monday and this is:1234 so I am in."
if grep -q "this is :" ./logfile; then
#here i want to print only sub-string with next word i.e. "this is:1234"
#echo ???
fi
You can use sed with \1 to display the matched string in \(..\):
sed 's/.*\(this is:[0-9a-zA-Z]*\).*/\1/' logfile
EDIT: The above command is only fine for 1 line input.
When you have a file with more lines, you only want to print the lines that match:
sed -n 's/.*\(this is:[0-9a-zA-Z]*\).*/\1/p' logfile
When you have a large file and only want to see the first match, you can combine this command with head -1, but you would like to stop scanning/parsing after the first match. You can use q to quit, but you only want to quit after a match.
sed -n '/.*\(this is:[0-9a-zA-Z]*\).*/{s//\1/p;q}'
You can use a regular expression with a look-behind, if you want only the next word:
$ grep --perl-regexp -o '(?<=(this is:))(\S+)' ./logfile
1234
If you want both, then just:
$ grep --perl-regexp -o 'this is:\S+' ./logfile
this is:1234
The -o option instructs grep to return only the matching part.
In the commands above, we assumed that a "word" is a sequence of non-space characters. You can adjust that according to your needs.
If you have a system with GNU extensions (but aren't certain it was compiled with optional PCRE support), consider:
if result=$(grep -E -m 1 -o 'this is:[^[:space:]]+' logfile); then
echo "value is: ${result#*:}"
fi
${varname#value} expands to the contents of varname, but with value stripped from the beginning if present. Thus, ${result#*:} strips everything up to the first colon in result.
However, this may not work on systems without the non-POSIX options -o or -m.
If you want to support non-GNU systems, awk is a tool worth considering: Unlike answers requiring nonportable extensions (like grep -P), this should work on any modern platform (tested with GNU awk, recent BSD awk, and mawk; also, no warnings with with gawk --posix --lint):
# note that the constant 8 is the length of "this is:"
# GNU awk has cleaner syntax, but trying to be portable here.
if value=$(awk '
BEGIN { matched=0; } # by default, this will trigger END to exit as failure
/this is:/ {
match($0, /this\ is:([^[:space:]]+)/);
print substr($0, RSTART+8, RLENGTH-8);
matched=1; # tell END block to use zero exit status
exit(0); # stop processing remaining file contents, jump to END
}
END { if(matched == 0) { exit(1); } }
'); then
echo "Found value of $value"
else
echo "Could not find $value in file"
fi
You can look for everything up to, but not including the next space like this:
grep -Eo "this is:[^[:space:]]+" logfile
The [] introduces the set of characters you are looking for and the ^ at the start complements the set, so the set of characters you are looking for is a blank space, but complemented, i.e. not a blank space. The + says there must be at least one or more such characters.
The -E tells grep to use extended regular expressions and the -o means to only print the matched part.

Using BASH, how to increment a number that uniquely only occurs once in most lines of an HTML file?

The target is always going to be between two characters, 'E' and '/' and there will never be but one occurrence of this combination, e.g. 'E01/' in most lines in the HTML file and will always be between '01' and '90'.
So, I need to programmatically read the file and replace each occurrence of 'Enn/' where 'nn' in 'Enn/' will be between '01' and '90' and must maintain the '0' for numbers '01' to '09' in 'Enn/' while incrementing the existing number by 1 throughout the HTML file.
Is this doable and if so how best to go about it?
Edit: Target lines will be in one or the other formats:
<DT>ProgramName
<DT>Program Name
You can use sed inside BASH as a fantastic one-liner, either:
sed -ri 's/(.*E)([0-9]{2})(\/.*)/printf "\1%02u\3" $((10#\2+(10#\2>=90?0:1)))/ge' FILENAME
or if you are guaranteed the number is lower than 100:
sed -ri 's/(.*E)([0-9]{2})(\/.*)/printf "\1%02u\3" $((10#\2+1)))/ge' FILENAME
Basically, you'll be doing inplace search and replace. The above will not add anything after 90 (since you didn't specify the exact nature of the overflow condition). So E89/ -> E90/, E90/ -> E90/, and if by chance you have E91/, it will remain E91/. Add this line inside a loop for multiple files
A small explanation of the above command:
-r states that you'll be using a regular expression
-i states to write back to the same file (be careful with overwriting!)
s/search/replace/ge this is the regex command you'll be using
s/ states you'll be using a string search
(.E) first grouping of all characters upto the first E (case sensitive)
([0-9]{2}) second grouping of numbers 0 through 9, repeated twice (fixed width)
(/.) third grouping getting the escaped trailing slash and everything after that
/ (slash separator) denotes end of search pattern and beginning of replacement pattern
printf "format" var this is the expression used for each replacement
\1 place first grouping found here
%02u the replace format for the var
\3 place third grouping found here
$((expression)) BASH arithmetic expression to use in printf format
10#\2 force second grouping as a base 10 number
+(10#\2>=90?0:1) add 0 or 1 to the second grouping based on if it is >= 90 (as used in first command)
+1 add 1 to the second grouping (see second command)
/ge flags for global replacement and the replace parameter will be an expression
GNU sed and awk are very powerful tools to do this sort of thing.
You can use the following perl one-liner to increment the numbers while maintaining the ones with leading 0s.
perl -pe 's/E\K([0-9]+)/sprintf "%02d", 1+$1/e' file
$ cat file
<DT>ProgramName
<DT>Program Name
<DT>Program Name
<DT>Program Name
$ perl -pe 's/E\K([0-9]+)/sprintf "%02d", 1+$1/e' file
<DT>ProgramName
<DT>Program Name
<DT>Program Name
<DT>Program Name
You can add the -i option to make changes in-place. I would recommend creating backup before doing so.
Not as elegant as one line sed!
Break the commands used into multiple commands and you can debug your bash or grep or sed.
# find the number
# use -o to grep to just return pattern
# use head -n1 for safety to just get 1 number
n=$(grep -o "E[0-9][0-9]\/" file.html |grep -o "[0-9][0-9]"|head -n1)
#octal 08 and 09 are problem so need to do this
n1=10#$n
echo Debug n1=$n1 n=$n
n2=n1
# bash arithmetic done inside (( ))
# as ever with bash bracketing whitespace is needed
(( n2++ ))
echo debug n2=$n2
# use sed with -i -e for inline edit to replace number
sed -ie "s/E$n\//E$(printf '%02d' $n2)\//" file.html
grep "E[0-9][0-9]" file.html
awk might be better. Maybe could do it in one awk command also.
The sed one-liner in other answer is awesome :-)
This works in bash or sh.
http://unixhelp.ed.ac.uk/CGI/man-cgi?grep

Capturing Groups From a Grep RegEx

I've got this little script in sh (Mac OSX 10.6) to look through an array of files. Google has stopped being helpful at this point:
files="*.jpg"
for f in $files
do
echo $f | grep -oEi '[0-9]+_([a-z]+)_[0-9a-z]*'
name=$?
echo $name
done
So far (obviously, to you shell gurus) $name merely holds 0, 1 or 2, depending on if grep found that the filename matched the matter provided. What I'd like is to capture what's inside the parens ([a-z]+) and store that to a variable.
I'd like to use grep only, if possible. If not, please no Python or Perl, etc. sed or something like it – I would like to attack this from the *nix purist angle.
Also, as a super-cool bonus, I'm curious as to how I can concatenate string in shell? Is the group I captured was the string "somename" stored in $name, and I wanted to add the string ".jpg" to the end of it, could I cat $name '.jpg'?
If you're using Bash, you don't even have to use grep:
files="*.jpg"
regex="[0-9]+_([a-z]+)_[0-9a-z]*"
for f in $files # unquoted in order to allow the glob to expand
do
if [[ $f =~ $regex ]]
then
name="${BASH_REMATCH[1]}"
echo "${name}.jpg" # concatenate strings
name="${name}.jpg" # same thing stored in a variable
else
echo "$f doesn't match" >&2 # this could get noisy if there are a lot of non-matching files
fi
done
It's better to put the regex in a variable. Some patterns won't work if included literally.
This uses =~ which is Bash's regex match operator. The results of the match are saved to an array called $BASH_REMATCH. The first capture group is stored in index 1, the second (if any) in index 2, etc. Index zero is the full match.
You should be aware that without anchors, this regex (and the one using grep) will match any of the following examples and more, which may not be what you're looking for:
123_abc_d4e5
xyz123_abc_d4e5
123_abc_d4e5.xyz
xyz123_abc_d4e5.xyz
To eliminate the second and fourth examples, make your regex like this:
^[0-9]+_([a-z]+)_[0-9a-z]*
which says the string must start with one or more digits. The carat represents the beginning of the string. If you add a dollar sign at the end of the regex, like this:
^[0-9]+_([a-z]+)_[0-9a-z]*$
then the third example will also be eliminated since the dot is not among the characters in the regex and the dollar sign represents the end of the string. Note that the fourth example fails this match as well.
If you have GNU grep (around 2.5 or later, I think, when the \K operator was added):
name=$(echo "$f" | grep -Po '(?i)[0-9]+_\K[a-z]+(?=_[0-9a-z]*)').jpg
The \K operator (variable-length look-behind) causes the preceding pattern to match, but doesn't include the match in the result. The fixed-length equivalent is (?<=) - the pattern would be included before the closing parenthesis. You must use \K if quantifiers may match strings of different lengths (e.g. +, *, {2,4}).
The (?=) operator matches fixed or variable-length patterns and is called "look-ahead". It also does not include the matched string in the result.
In order to make the match case-insensitive, the (?i) operator is used. It affects the patterns that follow it so its position is significant.
The regex might need to be adjusted depending on whether there are other characters in the filename. You'll note that in this case, I show an example of concatenating a string at the same time that the substring is captured.
This isn't really possible with pure grep, at least not generally.
But if your pattern is suitable, you may be able to use grep multiple times within a pipeline to first reduce your line to a known format, and then to extract just the bit you want. (Although tools like cut and sed are far better at this).
Suppose for the sake of argument that your pattern was a bit simpler: [0-9]+_([a-z]+)_ You could extract this like so:
echo $name | grep -Ei '[0-9]+_[a-z]+_' | grep -oEi '[a-z]+'
The first grep would remove any lines that didn't match your overall patern, the second grep (which has --only-matching specified) would display the alpha portion of the name. This only works because the pattern is suitable: "alpha portion" is specific enough to pull out what you want.
(Aside: Personally I'd use grep + cut to achieve what you are after: echo $name | grep {pattern} | cut -d _ -f 2. This gets cut to parse the line into fields by splitting on the delimiter _, and returns just field 2 (field numbers start at 1)).
Unix philosophy is to have tools which do one thing, and do it well, and combine them to achieve non-trivial tasks, so I'd argue that grep + sed etc is a more Unixy way of doing things :-)
I realize that an answer was already accepted for this, but from a "strictly *nix purist angle" it seems like the right tool for the job is pcregrep, which doesn't seem to have been mentioned yet. Try changing the lines:
echo $f | grep -oEi '[0-9]+_([a-z]+)_[0-9a-z]*'
name=$?
to the following:
name=$(echo $f | pcregrep -o1 -Ei '[0-9]+_([a-z]+)_[0-9a-z]*')
to get only the contents of the capturing group 1.
The pcregrep tool utilizes all of the same syntax you've already used with grep, but implements the functionality that you need.
The parameter -o works just like the grep version if it is bare, but it also accepts a numeric parameter in pcregrep, which indicates which capturing group you want to show.
With this solution there is a bare minimum of change required in the script. You simply replace one modular utility with another and tweak the parameters.
Interesting Note: You can use multiple -o arguments to return multiple capture groups in the order in which they appear on the line.
Not possible in just grep I believe
for sed:
name=`echo $f | sed -E 's/([0-9]+_([a-z]+)_[0-9a-z]*)|.*/\2/'`
I'll take a stab at the bonus though:
echo "$name.jpg"
This is a solution that uses gawk. It's something I find I need to use often so I created a function for it
function regex1 { gawk 'match($0,/'$1'/, ary) {print ary['${2:-'1'}']}'; }
to use just do
$ echo 'hello world' | regex1 'hello\s(.*)'
world
str="1w 2d 1h"
regex="([0-9])w ([0-9])d ([0-9])h"
if [[ $str =~ $regex ]]
then
week="${BASH_REMATCH[1]}"
day="${BASH_REMATCH[2]}"
hour="${BASH_REMATCH[3]}"
echo $week --- $day ---- $hour
fi
output:
1 --- 2 ---- 1
A suggestion for you - you can use parameter expansion to remove the part of the name from the last underscore onwards, and similarly at the start:
f=001_abc_0za.jpg
work=${f%_*}
name=${work#*_}
Then name will have the value abc.
See Apple developer docs, search forward for 'Parameter Expansion'.
I prefer the one line python or perl command, both often included in major linux disdribution
echo $'
<a href="http://stackoverflow.com">
</a>
<a href="http://google.com">
</a>
' | python -c $'
import re
import sys
for i in sys.stdin:
g=re.match(r\'.*href="(.*)"\',i);
if g is not None:
print g.group(1)
'
and to handle files:
ls *.txt | python -c $'
import sys
import re
for i in sys.stdin:
i=i.strip()
f=open(i,"r")
for j in f:
g=re.match(r\'.*href="(.*)"\',j);
if g is not None:
print g.group(1)
f.close()
'
The follow example shows how to extract the 3 character sequence from a filename using a regex capture group:
for f in 123_abc_123.jpg 123_xyz_432.jpg
do
echo "f: " $f
name=$( perl -ne 'if (/[0-9]+_([a-z]+)_[0-9a-z]*/) { print $1 . "\n" }' <<< $f )
echo "name: " $name
done
Outputs:
f: 123_abc_123.jpg
name: abc
f: 123_xyz_432.jpg
name: xyz
So the if-regex conditional in perl will filter out all non-matching lines at the same time, for those lines that do match, it will apply the capture group(s) which you can access with $1, $2, ... respectively,
if you have bash, you can use extended globbing
shopt -s extglob
shopt -s nullglob
shopt -s nocaseglob
for file in +([0-9])_+([a-z])_+([a-z0-9]).jpg
do
IFS="_"
set -- $file
echo "This is your captured output : $2"
done
or
ls +([0-9])_+([a-z])_+([a-z0-9]).jpg | while read file
do
IFS="_"
set -- $file
echo "This is your captured output : $2"
done

Resources