Avoid combination of hyphen and space using a regex - ruby

I'm currently writing a very specific regex for a firstname field, that has several requirements. One of them is that spaces are not allowed before or after hyphens. For this, I have used a negative lookahead:
(?!.*(\s\-))
as part of the regex:
^(?!ß)(?!.*(\s\-))(?!(.)\1{2})(?!.*\s{2})(?!.*\'{2})(?!.*\-{2})[a-zA-ZßöüäÜÖÄ\s\-\']{2,30}(?<![\s\-])$
It does return a mismatch for:
asdf -asdf
but not for:
asdf- asdf
The latter also need to return an error. What am I missing?

You have to assert the other combination of hyphens and whitespaces absent in your string also:
(?!.*(\s\-))(?!.*(\-\s))

You can rewrite your pattern in a more simple way that avoids many problems and makes your pattern more efficient, example:
^(?=.{2,30}$)(?!(.)\1{2})[a-zA-ZöüäÜÖÄ]+(?:[-'\s][a-zA-ZßöüäÜÖÄ]+)*$

Simplest is probably a negative lookahead right after the ^:
/^(?!.*(\s-|-\s))#{main_pattern}/

Related

simple symbol regex solution

The problem I'm looking at says only inputs with '+' symbols covering any letters in the string is true so like "+d++" or "+d+==+a+" but not
"f++d+"
"3+a=+b+"
"++d+=c+"
I tried to solve this using regex since it's kind of a string pattern matching problem. /(+[a-z][^+])|([^+.][a-z]+)/ but this does not cover patterns where the letters are at the beginning or end of the string. I need help something more comprehensive.
You should try following
/^\+{0,2}[a-z0-9]+\+{0,2}(=*\+{0-2}[a-z0-9]+\+{0,2})*$/
You could use the below regex.
^(?:[^\w\n]*\+[a-z]+\+)+[^\w\n]*$
DEMO
If you want to match +f+g+ also, then put the following + inside a positive lookahead assertion.
^(?:[^\w\n]*\+[a-z]+(?=\+))+[^\w\n]*$
DEMO

Regex for matching everything before trailing slash, or first question mark?

I'm trying to come up with a regex that will elegantly match everything in an URL AFTER the domain name, and before the first ?, the last slash, or the end of the URL, if neither of the 2 exist.
This is what I came up with but it seems to be failing in some cases:
regex = /[http|https]:\/\/.+?\/(.+)[?|\/|]$/
In summary:
http://nytimes.com/2013/07/31/a-new-health-care-approach-dont-hide-the-price/ should return
2013/07/31/a-new-health-care-approach-dont-hide-the-price
http://nytimes.com/2013/07/31/a-new-health-care-approach-dont-hide-the-price?id=2 should return
2013/07/31/a-new-health-care-approach-dont-hide-the-price
http://nytimes.com/2013/07/31/a-new-health-care-approach-dont-hide-the-price should return
2013/07/31/a-new-health-care-approach-dont-hide-the-price
Please don't use Regex for this. Use the URI library:
require 'uri'
str_you_want = URI("http://nytimes.com/2013/07/31/a-new-health-care-approach-dont-hide-the-price").path
Why?
See everything about this famous question for a good discussion of why these kinds of things are a bad idea.
Also, this XKCD really says why:
In short, Regexes are an incredibly powerful tools, but when you're dealing with things that are made from hundred page convoluted standards when there is already a library for doing it faster, easier, and more correctly, why reinvent this wheel?
If lookaheads are allowed
((2[0-9][0-9][0-9].*)(?=\?\w+)|(2[0-9][0-9][0-9].*)(?=/\s+)|(2[0-9][0-9][0-9].*).*\w)
Copy + Paste this in http://regexpal.com/
See here with ruby regex tester: http://rubular.com/r/uoLLvTwkaz
Image using javascript regex, but it works out the same
(?=) is just a a lookahead
I basically set up three matches from 2XXX up to (in this order):
(?=\?\w+) # lookahead for a question mark followed by one or more word characters
(?=/\s+) # lookahead for a slash followed by one or more whitespace characters
.*\w # match up to the last word character
I'm pretty sure that some parentheses were not needed but I just copy pasted.
There are essentially two OR | expressions in the (A|B|C) expression. The order matters since it's like a (ifthen|elseif|else) type deal.
You can probably fix out the prefix, I just assumed that you wanted 2XXX where X is a digit to match.
Also, save the pitchforks everyone, regular expressions are not always the best but it's there for you when you need it.
Also, there is xkcd (https://xkcd.com/208/) for everything:

escape sequence \K for regular expression in boost library

I need to replace a look-behind expression with \K in boost (version 1.54) because of its limitation but it does not work. How can I do it or what is the problem? Is there any other way to convert this expression with lookahead?
"(?<=foo.*) bar" => "foo.*\K bar" ???
Bit of a late answer here...
According to the Boost.Regex 1.54 Documentation, use of Perl's \K is possible, and I have just confirmed it via testing in Sublime Text 3, which uses Boost.Regex for its regex searching engine. Furthermore, I see no obvious syntactical error with either of the forms you posted. The only thing I can think of is that you're using the regex inside a string literal, and haven't escaped the \. If that's the case, the correct regex for your example would be:
foo.*\\K bar
If that's not the case, one workaround (that obviously has performance implications) is to reverse the string, and then use a variable-width look-ahead.
The modified regex for your example would then be:
rab (?=.*oof)
I believe the problem is that Boost lookbehind pattern must be of fixed length.
Your expression contains a repeat .* which makes it variable length.

ruby regex match any character besides a specific one

I am looking for a way to match any character besides, for example, a "#."
It would look something like...
gsub(/^foo.*foo$/)
But I'd want it to match
"foofdfdfdfoo"
But not
"fooddgdgd#fdfoo"
Thanks.
^[^#]+$
http://rubular.com/r/glijo99dU9
gsub is for substitution. If you just want to match, the .match method
To expand on Explosion Pills answer, a caret (^) will negate the match in a regex. This means that it will not match if the characters following it are found in the expression. You can read more about it in the documentation.

Multi-Line Regex: Find A where B is absent

I have been looking through a lot on Regex lately and have seen a lot of answers involving the matching of one word, where a second word is absent. I have seen a lot of Regex Examples where I can have a Regex search for a given word (or any more complex regex in its place) and find where a word is missing.
It seems like the works very well on a line by line basis, but after including the multi-line mode it still doesn't seem to match properly.
Example: Match an entire file string where the word foo is included, but the word bar is absent from the file. What I have so far is (?m)^(?=.*?(foo))((?!bar).)*$ which is based off the example link. I have been testing with a Ruby Regex tester, but I think it is a open ended regex problem/question. It seems to match smaller pieces, I would like to have it either match/not match on the entire string as one big chunk.
In the provided example above, matches are found on a line by line basis it seems. What changes need to be made to the regex so it applies over the ENTIRE string?
EDIT: I know there are other more efficient ways to solve this problem that doesn't involve using a regex. I am not looking for a solution to the problem using other means, I am asking from a theoretical regex point of view. It has a multi-line mode (which looks to "work"), it has negative/positive searching which can be combined on a line by line basis, how come combining these two principals doesn't yield the expected result?
Sawa's answer can be simplified, all that's needed is a positive lookahead, a negative lookahead, and since you're in multiline mode, .* takes care of the rest:
/(?=.*foo)(?!.*bar).*/m
Multiline means that . matches \n also, and matches are greedy. So the whole string will match without the need for anchors.
Update
#Sawa makes a good point for the \A being necessary but not the \Z.
Actually, looking at it again, the positive lookahead seems unnecessary:
/\A(?!.*bar).*foo.*/m
A regex that matches an entire string that does not include foo is:
/\A(?!.*foo.*).*\z/m
and a regex that matches from the beginning of an entire string that includes bar is:
/\A.*bar/m
Since you want to satisfy both of these, take a conjunction of these by putting one of them in a lookahead:
/\A(?=.*bar)(?!.*foo.*).*\z/m

Resources