Ruby gsub : is there a better way - ruby

I need to remove all leading and trailing non-numeric characters. This is what I came up with. Is there a better implementation.
puts s.gsub(/^\D+/,'').gsub(/\D+$/,'')

Instead of eliminating what you don't want, it's often clearer to select what you do want (using parentheses). Also, this only requires one regex evaluation:
s.match(/^\D*(.*?)\D*$/)[1]
Or, this convenient shorthand:
s[/^\D*(.*?)\D*$/, 1]

Perhaps a single #gsub(/(^\D+)|(\D+$)/, '')
Also, when in doubt Rubular it.

Related

Escape whitespace characters in Ruby

I have a string:
"hello\n\nsomeletters\t\nmoreletters\n"
What I want:
"hello\\n\\nsomeletters\\t\\nmoreletters\\n"
How to do it?
I know a gsub way. But it sounds very simple and seems to be a common problem therefore I am sure that Ruby Gods have already sent us a solution.
There are different possibilities. The closest to what you want would be Regexp#escape:
Regexp.escape "hello\n\nsomeletters\t\nmoreletters\n"
#⇒ "hello\\n\\nsomeletters\\t\\nmoreletters\\n"
But be aware it will escape some other symbols having a special meaning in regular expressions.
Also, we have Shellwords#escape, which is probably not what you want here.
For escaping backslashes only there is no dedicated method because this operation basically has a little sense and it is not worth it to call it instead of:
"hello\n\nsomeletters\t\nmoreletters\n".gsub(
/\n|\t/, {"\n" => "\\n", "\t" => "\\t"}
)
Please note, there are no slash characters in the initial string, hence you are to match all the expected sequences.

delete matched characters using regex in ruby

I need to write a regex for the following text:
"How can you restate your point (something like: \"<font>First</font>\") as a clear topic?"
that keeps whatever is between the
\" \"
characters (in this case <font>First</font>
I came up with this:
/"How can you restate your point \(something like: |\) as a clear topic\?"/
but how do I get ruby to remove the unwanted surrounding text and only return <font>First</font>?
lookbehind, lookahead and making what is greedy, lazy.
str[/(?<=\").+?(?=\")/] #=> "<font>First</font>"
If you have strings just like that, you can .split and get the first:
> str.split(/"/)[1]
=> "<font>First</font>"
You certainly can use a regular expression, but you don't need to:
str = "How can you restate (like: \"<font>First</font>\") as a clear topic?"
str[str.index('"')+1...str.rindex('"')]
#=> "<font>First</font>"
or, for those like me who never use three dots:
str[str.index('"')+1..str.rindex('"')-1]

Avoid combination of hyphen and space using a regex

I'm currently writing a very specific regex for a firstname field, that has several requirements. One of them is that spaces are not allowed before or after hyphens. For this, I have used a negative lookahead:
(?!.*(\s\-))
as part of the regex:
^(?!ß)(?!.*(\s\-))(?!(.)\1{2})(?!.*\s{2})(?!.*\'{2})(?!.*\-{2})[a-zA-ZßöüäÜÖÄ\s\-\']{2,30}(?<![\s\-])$
It does return a mismatch for:
asdf -asdf
but not for:
asdf- asdf
The latter also need to return an error. What am I missing?
You have to assert the other combination of hyphens and whitespaces absent in your string also:
(?!.*(\s\-))(?!.*(\-\s))
You can rewrite your pattern in a more simple way that avoids many problems and makes your pattern more efficient, example:
^(?=.{2,30}$)(?!(.)\1{2})[a-zA-ZöüäÜÖÄ]+(?:[-'\s][a-zA-ZßöüäÜÖÄ]+)*$
Simplest is probably a negative lookahead right after the ^:
/^(?!.*(\s-|-\s))#{main_pattern}/

Regex for matching everything before trailing slash, or first question mark?

I'm trying to come up with a regex that will elegantly match everything in an URL AFTER the domain name, and before the first ?, the last slash, or the end of the URL, if neither of the 2 exist.
This is what I came up with but it seems to be failing in some cases:
regex = /[http|https]:\/\/.+?\/(.+)[?|\/|]$/
In summary:
http://nytimes.com/2013/07/31/a-new-health-care-approach-dont-hide-the-price/ should return
2013/07/31/a-new-health-care-approach-dont-hide-the-price
http://nytimes.com/2013/07/31/a-new-health-care-approach-dont-hide-the-price?id=2 should return
2013/07/31/a-new-health-care-approach-dont-hide-the-price
http://nytimes.com/2013/07/31/a-new-health-care-approach-dont-hide-the-price should return
2013/07/31/a-new-health-care-approach-dont-hide-the-price
Please don't use Regex for this. Use the URI library:
require 'uri'
str_you_want = URI("http://nytimes.com/2013/07/31/a-new-health-care-approach-dont-hide-the-price").path
Why?
See everything about this famous question for a good discussion of why these kinds of things are a bad idea.
Also, this XKCD really says why:
In short, Regexes are an incredibly powerful tools, but when you're dealing with things that are made from hundred page convoluted standards when there is already a library for doing it faster, easier, and more correctly, why reinvent this wheel?
If lookaheads are allowed
((2[0-9][0-9][0-9].*)(?=\?\w+)|(2[0-9][0-9][0-9].*)(?=/\s+)|(2[0-9][0-9][0-9].*).*\w)
Copy + Paste this in http://regexpal.com/
See here with ruby regex tester: http://rubular.com/r/uoLLvTwkaz
Image using javascript regex, but it works out the same
(?=) is just a a lookahead
I basically set up three matches from 2XXX up to (in this order):
(?=\?\w+) # lookahead for a question mark followed by one or more word characters
(?=/\s+) # lookahead for a slash followed by one or more whitespace characters
.*\w # match up to the last word character
I'm pretty sure that some parentheses were not needed but I just copy pasted.
There are essentially two OR | expressions in the (A|B|C) expression. The order matters since it's like a (ifthen|elseif|else) type deal.
You can probably fix out the prefix, I just assumed that you wanted 2XXX where X is a digit to match.
Also, save the pitchforks everyone, regular expressions are not always the best but it's there for you when you need it.
Also, there is xkcd (https://xkcd.com/208/) for everything:

How can I simplify this regular expression?

The format I'm trying to match is:
# (Apple push notification codes)
"11a735e9 9f696c2f 700b2700 728042c6 137eeb7a 8442c27d 40e59d9e 3c7e0de7"
The simplest expression I can think of is: /((\w{8}\s){7}\w{8})/i
Can anyone think of a simpler one?
(I'm using Ruby regular expressions)
UPDATE - thanks to user1096188, I've removed \d - this is included in \w
You can detect a word boundary using \b, and use (?: to prevent capturing groups
/(?:\w{8}\b\s?){8}/
You could do this if the end of the match is the end of the whole string.
(\w{8}(:?\s|$)){7}
Taking #zapthedingbat's solution one stage further, it looks like the code only contains hexadecimal characters (0-9 and a-f) and spaces. So you could possibly sacrifice a little simplicity for accuracy.
I'm making an assumption, but I suspect letters g to z are invalid.
If the format is hexadecimal only (you should check Apple's documentation to be sure), a tighter match would be:
/(?:[0-9a-f]{8}\b\s?){8}/
EDIT
In fact, in Ruby, it looks like you should be able to do:
/(?:\h{8}\b\s?){8}/
> "11a735e9 9f696c2f 700b2700 728042c6 137eeb7a 8442c27d 40e59d9e 3c7e0de7".match(/((\w{8}\s)+)/)
> $&
=> "11a735e9 9f696c2f 700b2700 728042c6 137eeb7a 8442c27d 40e59d9e 3c7e0de7"

Resources