How can I simplify this regular expression?

How can I simplify this regular expression? - ruby

The format I'm trying to match is:
# (Apple push notification codes)
"11a735e9 9f696c2f 700b2700 728042c6 137eeb7a 8442c27d 40e59d9e 3c7e0de7"
The simplest expression I can think of is: /((\w{8}\s){7}\w{8})/i
Can anyone think of a simpler one?
(I'm using Ruby regular expressions)
UPDATE - thanks to user1096188, I've removed \d - this is included in \w

You can detect a word boundary using \b, and use (?: to prevent capturing groups
/(?:\w{8}\b\s?){8}/

You could do this if the end of the match is the end of the whole string.
(\w{8}(:?\s|$)){7}

Taking #zapthedingbat's solution one stage further, it looks like the code only contains hexadecimal characters (0-9 and a-f) and spaces. So you could possibly sacrifice a little simplicity for accuracy.
I'm making an assumption, but I suspect letters g to z are invalid.
If the format is hexadecimal only (you should check Apple's documentation to be sure), a tighter match would be:
/(?:[0-9a-f]{8}\b\s?){8}/
EDIT
In fact, in Ruby, it looks like you should be able to do:
/(?:\h{8}\b\s?){8}/

> "11a735e9 9f696c2f 700b2700 728042c6 137eeb7a 8442c27d 40e59d9e 3c7e0de7".match(/((\w{8}\s)+)/)
> $&
=> "11a735e9 9f696c2f 700b2700 728042c6 137eeb7a 8442c27d 40e59d9e 3c7e0de7"

Related

Avoid combination of hyphen and space using a regex

I'm currently writing a very specific regex for a firstname field, that has several requirements. One of them is that spaces are not allowed before or after hyphens. For this, I have used a negative lookahead:
(?!.*(\s\-))
as part of the regex:
^(?!ß)(?!.*(\s\-))(?!(.)\1{2})(?!.*\s{2})(?!.*\'{2})(?!.*\-{2})[a-zA-ZßöüäÜÖÄ\s\-\']{2,30}(?<![\s\-])$
It does return a mismatch for:
asdf -asdf
but not for:
asdf- asdf
The latter also need to return an error. What am I missing?

You have to assert the other combination of hyphens and whitespaces absent in your string also:
(?!.*(\s\-))(?!.*(\-\s))

You can rewrite your pattern in a more simple way that avoids many problems and makes your pattern more efficient, example:
^(?=.{2,30}$)(?!(.)\1{2})[a-zA-ZöüäÜÖÄ]+(?:[-'\s][a-zA-ZßöüäÜÖÄ]+)*$

Simplest is probably a negative lookahead right after the ^:
/^(?!.*(\s-|-\s))#{main_pattern}/

Regex for matching everything before trailing slash, or first question mark?

I'm trying to come up with a regex that will elegantly match everything in an URL AFTER the domain name, and before the first ?, the last slash, or the end of the URL, if neither of the 2 exist.
This is what I came up with but it seems to be failing in some cases:
regex = /[http|https]:\/\/.+?\/(.+)[?|\/|]$/
In summary:
http://nytimes.com/2013/07/31/a-new-health-care-approach-dont-hide-the-price/ should return
2013/07/31/a-new-health-care-approach-dont-hide-the-price
http://nytimes.com/2013/07/31/a-new-health-care-approach-dont-hide-the-price?id=2 should return
2013/07/31/a-new-health-care-approach-dont-hide-the-price
http://nytimes.com/2013/07/31/a-new-health-care-approach-dont-hide-the-price should return
2013/07/31/a-new-health-care-approach-dont-hide-the-price

Please don't use Regex for this. Use the URI library:
require 'uri'
str_you_want = URI("http://nytimes.com/2013/07/31/a-new-health-care-approach-dont-hide-the-price").path
Why?
See everything about this famous question for a good discussion of why these kinds of things are a bad idea.
Also, this XKCD really says why:
In short, Regexes are an incredibly powerful tools, but when you're dealing with things that are made from hundred page convoluted standards when there is already a library for doing it faster, easier, and more correctly, why reinvent this wheel?

If lookaheads are allowed
((2[0-9][0-9][0-9].*)(?=\?\w+)|(2[0-9][0-9][0-9].*)(?=/\s+)|(2[0-9][0-9][0-9].*).*\w)
Copy + Paste this in http://regexpal.com/
See here with ruby regex tester: http://rubular.com/r/uoLLvTwkaz
Image using javascript regex, but it works out the same
(?=) is just a a lookahead
I basically set up three matches from 2XXX up to (in this order):
(?=\?\w+) # lookahead for a question mark followed by one or more word characters
(?=/\s+) # lookahead for a slash followed by one or more whitespace characters
.*\w # match up to the last word character
I'm pretty sure that some parentheses were not needed but I just copy pasted.
There are essentially two OR | expressions in the (A|B|C) expression. The order matters since it's like a (ifthen|elseif|else) type deal.
You can probably fix out the prefix, I just assumed that you wanted 2XXX where X is a digit to match.
Also, save the pitchforks everyone, regular expressions are not always the best but it's there for you when you need it.
Also, there is xkcd (https://xkcd.com/208/) for everything:

how to use regex negation string

can any body tell me how to use regex for negation of string?
I wanna find all line that start with public class and then any thing except first,second and finally any thing else.
for example in the result i expect to see public class base but not public class myfirst:base
can any body help me please??

Use a negative lookahead:
public\s+class\s+(?!first|second).+

If Peter is correct and you're using Visual Studio's Find feature, this should work:
^:b*public:b+class:b+~(first|second):i.*$
:b matches a space or tab
~(...) is how VS does a negative lookahead
:i matches a C/C++ identifier
The rest is standard regex syntax:
^ for beginning of line
$ for end of line
. for any character
* for zero or more
+ for one or more
| for alternation

Both the other two answers come close, but probably fail for different reasons.
public\s+class\s+(?:(?!first|second).)+
Note how there is a (non-capturing) group around the negative lookahead, to ensure it applies to more than just the first position.
And that group is less restrictive - since . excludes newline, it's using that instead of \S, and the $ is not necessary - this will exclude the specified words and match others.
No slashes wrapping the expression since those aren't required in everything and may confuse people that have only encountered string-based regex use.
If this still fails, post the exact content that is wrongly matched or missed, and what language/ide you are using.
Update:
Turns out you're using Visual Studio, which has it's own special regex implementation, for some unfathomable reason. So, you'll be wanting to try this instead:
public:b+class:b+~(first|second)+$
I have no way of testing that - if it doesn't work, try dropping the $, but otherwise you'll have to find a VS user. Or better still, the VS engineer(s) responsible for this stupid non-standard regex.

Here is something that should work for you
/public\sclass\s(?:[^fs\s]+|(?!first|second)\S)+(?=\s|$)/
The second look a head could be changed to a $(end of line) or another anchor that works for your particular use case, like maybe a '{'
Edit: Try changing the last part to:
(?=\s|$)

Ruby gsub : is there a better way

I need to remove all leading and trailing non-numeric characters. This is what I came up with. Is there a better implementation.
puts s.gsub(/^\D+/,'').gsub(/\D+$/,'')

Instead of eliminating what you don't want, it's often clearer to select what you do want (using parentheses). Also, this only requires one regex evaluation:
s.match(/^\D*(.*?)\D*$/)[1]
Or, this convenient shorthand:
s[/^\D*(.*?)\D*$/, 1]

Perhaps a single #gsub(/(^\D+)|(\D+$)/, '')
Also, when in doubt Rubular it.

gsub partial replace

I would like to replace only the group in parenthesis in this expression :
my_string.gsub(/<--MARKER_START-->(.)*<--MARKER_END-->/, 'replace_text')
so that I get : <--MARKER_START-->replace_text<--MARKER_END-->
I know I could repeat the whole MARKER_START and MARKER_END blocks in the substitution expression but I thought there should be a more simple way to do this.

You can do it with zero width look-ahead and look-behind assertions.
This regex should work in ruby 1.9 and in perl and many other places:
Note: ruby 1.8 only supports look-ahead assertions. You need both look-ahead and look-behind to do this properly.
s.gsub( /(?<=<--MARKER START-->).*?(?=<--MARKER END-->)/, 'replacement text' )
What happens in ruby 1.8 is the ?<= causes it to crash because it doesn't understand the look-behind assertion. For that part, you then have to fall back to using a backreference - like Greig Hewgill mentions
so what you get is
s.gsub( /(<--MARKER START-->).*?(?=<--MARKER END-->)/, '\1replacement text' )
EXPLANATION THE FIRST:
I've replaced the (.)* in the middle of your regex with .*? - this is non-greedy.
If you don't have non-greedy, then your regex will try and match as much as it can - if you have 2 markers on one line, it goes wrong. This is best illustrated by example:
"<b>One</b> Two <b>Three</b>".gsub( /<b>.*<\/b>/, 'BOLD' )
=> "BOLD"
What we actually want:
"<b>One</b> Two <b>Three</b>".gsub( /<b>.*?<\/b>/, 'BOLD' )
=> "BOLD Two BOLD"
EXPLANATION THE SECOND:
zero-width-look-ahead-assertion sounds like a giant pile of nerdly confusion.
What "look-ahead-assertion" actually means is "Only match, if the thing we are looking for, is followed by this other stuff.
For example, only match a digit, if it is followed by an F.
"123F" =~ /\d(?=F)/ # will match the 3, but not the 1 or the 2
What "zero width" actually means is "consider the 'followed by' in our search, but don't count it as part of the match when doing replacement or grouping or things like that.
Using the same example of 123F, If we didn't use the lookahead assertion, and instead just do this:
"123F" =~ /\dF/ # will match 3F, because F is considered part of the match
As you can see, this is ideal for checking for our <--MARKER END-->, but what we need for the <--MARKER START--> is the ability to say "Only match, if the thing we are looking for FOLLOWS this other stuff". That's called a look-behind assertion, which ruby 1.8 doesn't have for some strange reason..
Hope that makes sense :-)
PS: Why use lookahead assertions instead of just backreferences? If you use lookahead, you're not actually replacing the <--MARKER--> bits, only the contents. If you use backreferences, you are replacing the whole lot. I don't know if this incurs much of a performance hit, but from a programming point of view it seems like the right thing to do, as we don't actually want to be replacing the markers at all.

You could do something like this:
my_string.gsub(/(<--MARKER_START-->)(.*)(<--MARKER_END-->)/, '\1replace_text\3')

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How can I simplify this regular expression? - ruby

You can detect a word boundary using \b, and use (?: to prevent capturing groups /(?:\w{8}\b\s?){8}/

You could do this if the end of the match is the end of the whole string. (\w{8}(:?\s|$)){7}

> "11a735e9 9f696c2f 700b2700 728042c6 137eeb7a 8442c27d 40e59d9e 3c7e0de7".match(/((\w{8}\s)+)/) > $& => "11a735e9 9f696c2f 700b2700 728042c6 137eeb7a 8442c27d 40e59d9e 3c7e0de7"

Related

Avoid combination of hyphen and space using a regex

Regex for matching everything before trailing slash, or first question mark?

how to use regex negation string

Ruby gsub : is there a better way

gsub partial replace

Categories

Resources