Regex to match a specific parenthesis among multiple - ruby

Take the String:
"The only true (wisdom) is in knowing you know (nothing)"
I want to extract nothing.
What I know about it:
It will always be inside a parenthesis
The parenthesis will always be the last element before the line-end: $
I first attempted to match it with
/\(.*\)$/, but that obviously returned
(wisdom) is in knowing you know (nothing).

You want to use negative character group matching, like [^...]:
s = 'The only true (wisdom) is in knowing you know (nothing)'
s.match(/\(([^)]+)\)$/).captures
Debuggex Demo
In this case, nothing is in the first sub-group match, but the entire regex technically matches (nothing). To match exactly nothing as the entire match, use:
s = 'The only true (wisdom) is in knowing you know (nothing)'
s.match(/(?<=\()([^)]+)(?=\)$)/).captures
Debuggex Demo

I would do
s = 'The only true (wisdom) is in knowing you know (nothing)'
s.match(/\(([^)]+)\)$/).captures # => ["nothing"]

You could use scan to find all matches and then take the last one:
str = "The only true (wisdom) is in knowing you know (nothing)"
str.scan(/\((.+?)\)/).last
#=> "nothing"

You can use the \z which matches end of string. try
\([a-z]+\)\z
Way simpler and will ignore everything else but what you need.
Test it here:
http://rubular.com/

It's even trickier if there's any chance of nesting. In that case you need some recursion:
"...knowing you know ((almost) nothing)"[/\(((?:[^()]*|\(\g<1>\))*)\)$/, 1]
#=> "(almost) nothing"

Look ma, no regex!
s = 'The only true (wisdom) is in knowing you know (nothing)'
r = s.reverse
r[(r.index(')') + 1)...(r.index('('))].reverse
#=> "nothing"

Related

opposite of sub in ruby

I want to replace the content (or delete it) that does not match with my filter.
I think the perfect description would be an opposite sub. I cannot find anything similar in the docs, and I'm not sure how to invert the regex, but I think a method would probably be the more convenient.
An example of how it would work (I've just changed the words to make it more clear)
"bird.cats.dogs".opposite_sub(/(dogs|cats)\.(dogs|cats)/, '')
#"cats.dogs"
I hope it's easy enough to understand.
Thanks in advance.
String#[] can take a regular expression as its parameter:
▶ "bird.cats.dogs"[/(dogs|cats)\.(dogs|cats)/]
#⇒ "cats.dogs"
For multiple matches one can use String#scan:
▶ "bird.cats.dogs.bird.cats.dogs".scan /(?:dogs|cats)\.(?:dogs|cats)/
#⇒ ["cats.dogs", "cats.dogs"]
So you want to extract the part that matches your regex?
You can use String#slice, for example:
"bird.cats.dogs".slice(/(dogs|cats)\.(dogs|cats)/)
#=> "cats.dogs"
And String#[] does the same.
"bird.cats.dogs"[/(dogs|cats)\.(dogs|cats)/]
#=> "cats.dogs"
You cannot have a single replacement string because the part of the string that matches the regex might not be at the beginning or end of the string, in which case it's not clear whether the replacement string should precede or follow the matching string. I've therefore written the following with two replacement strings, one for pre-match, the other for post_match. I've made this a method of the String class as that's what you've asked for (though I've given the method a less-perfect name :-) )
class String
def replace_non_matching(regex, replace_before, replace_after)
first, match, last = partition(regex)
replace_before + match + replace_after
end
end
r = /(dogs|cats)\.(dogs|cats)/
"birds.cats.dogs.pigs".replace_non_matching(r, "", "")
#=> "cats.dogs"
"birds.cats.dogs".replace_non_matching(r, "snakes.", ".hens")
#=> "snakes.cats.dogs.hens"
"birds.cats.dogs.mice.cats.dogs.bats".replace_non_matching(r, "snakes.", ".hens")
#=> "snakes.cats.dogs.hens"
Regarding the last example, the method could be modified to replace "birds.", ".mice." and ".bats", but in that case three replacement strings would be needed. In general, determining in advance the number of replacement strings needed could be problematic.

ruby/regex getting the first letter of each word

I want to get the first letter of each word put together, making something like "I need help" turn into "Inh". I was thinking to trim everything off, then going from there, or grab each first letter right away.
You could simply use split, map and join together here.
string = 'I need help'
result = string.split.map(&:first).join
puts result #=> "Inh"
How about regular expressions? Using the split method here forces a focus on the parts of the string that you don't need to for this problem, then taking another step of extracting the first letter of each word (chr). that's why I think regular expressions is better for this case. Node that this will also work if you have a - or another special character in the string. And then, of course you can add .upcase method at the end to get a proper acronym.
string = 'something - something and something else'
string.scan(/\b\w/).join
#=> ssase
Alternative solution using regex
string = 'I need help'
result = string.scan(/(\A\w|(?<=\s)\w)/).flatten.join
puts result
This basically says "look for either the first letter or any letter directly preceded by a space". The scan function returns array of arrays of matches, which is flattened (made into one array) and joined (made into a string).
string = 'I need help'
result = string.split.map(&:chr).join
puts result
http://ruby-doc.org/core-2.0/String.html#method-i-chr

how do I use String.delete to remove '<em>' from a string in Ruby?

I'm sure I can do this with a regex, but I can't find any explanation for this behavior using just normal delete!:
#1.9.2
>> "helllom<em>".delete!"<em>"
=> "hlllo"
The docs don't have anything to say about this. Seems to me that it's treating '<em>' as a set. Where is this documented?
Edit: in my defense I was looking for special treatment of < and > in the docs under delete. Didn't see anything about it and tried google, which also didn't have anything to say about that -- because it doesn't exist.
String#delete is one of those unfortunate methods that is difficult to explain (I have no idea what the use case is). In practice, I've always used gsub with an empty string as the second argument.
'helllom<em>'.gsub '<em>', '' # => "helllom"
Note that String#gsub! also has weirdness such that you should not depend on its return value, it will return nil if it does not alter the string, so it is best to use gsub if you depend on the return value, or if you want to mutate the string, then use gsub! but and don't use anything else on that line.
You cannot use String#delete to remove substrings.
Check the API. It removes all the characters from given parameters from the given string.
I your case it removes all occurrences of e, m, < and >.
Straight from the docs:
delete([other_str]+) → new_str
Returns a copy of str with all characters in the intersection of its
arguments deleted. Uses the same rules for building the set of
characters as String#count.
ex:
"hello".delete "l","lo" #=> "heo"
"hello".delete "lo" #=> "he"
"hello".delete "aeiou", "^e" #=> "hell"
"hello".delete "ej-m" #=> "ho"
So every character in the intersection of the two strings is removed.

Capture arbitrary string before either '/' or end of string

Suppose I have:
foo/fhqwhgads
foo/fhqwhgadshgnsdhjsdbkhsdabkfabkveybvf/bar
And I want to replace everything that follows 'foo/' up until I either reach '/' or, if '/' is never reached, then up to the end of the line. For the first part I can use a non-capturing group like this:
(?<=foo\/).+
And that's where I get stuck. I could match to the second '/' like this:
(?<=foo\/).+(?=\/)
That doesn't help for the first case though. Desired output is:
foo/blah
foo/blah/bar
I'm using Ruby.
Try this regex:
/(?<=foo\/)[^\/]+/
Implementing #Endophage's answer:
def fix_post_foo_portion(string)
portions = string.split("/")
index_to_replace = portions.index("foo") + 1
portions[index_to_replace ] = "blah"
portions.join("/")
end
strings = %w{foo/fhqwhgads foo/fhqwhgadshgnsdhjsdbkhsdabkfabkveybvf/bar}
strings.each {|string| puts fix_post_foo_portion(string)}
I'm not a ruby dev but is there some equivalent of php's explode() so you could explode the string, insert a new item at the second array index then implode the parts with / again... Of course you can match on the first array element if you only want to do the switch in certain cases.
['foo/fhqwhgads', 'foo/fhqwhgadshgnsdhjsdbkhsdabkfabkveybvf/bar'].each do |s|
puts s.sub(%r|^(foo/)[^/]+(/.*)?|, '\1blah\2')
end
Output:
foo/blah
foo/blah/bar
I'm too tired to think of a nicer way to do it but I'm sure there is one.
Checking for the end-of-string anchor -- $ -- as well as the / character should do the trick. You'll also need to make the .+ non-greedy by changing it to .+? since the greedy version will always match right up to the end of the string, given the chance.
(?<=foo\/).+?(?=\/|$)

Capitalization of strings

Let us imagine, that we have a simple abstract input form, whose aim is accepting some string, which could consist of any characters.
string = "mystical characters"
We need to process this string by making first character uppercased. Yes, that is our main goal. Thereafter we need to display this converted string in some abstract view template. So, the question is: do we really need to check whether the first character is already written correctly (uppercased) or we are able to write just this?
theresult = string.capitalize
=> "Mystical characters"
Which approach is better: check and then capitalize (if need) or force capitalization?
Check first if you need to process something, because String#capitalize doesn't only convert the first character to uppercase, but it also converts all other characters downcase. So..
"First Lastname".capitalize == "First lastname"
That might not be the wanted result.
If I understood correctly you are going to capitalize the string anyway, so why bother checking if it's already capitalized?
Based on Tonttu answer I would suggest not to worry too much and just capitalize like this:
new_string = string[0...1].capitalize + string[1..-1]
I ran in to Tonttu's problem importing a bunch of names, I went with:
strs = "first lastname".split(" ")
return_string = ""
strs.each do |str|
return_string += "#{str[0].upcase}#{str[1..str.length].downcase} "
end
return_string.chop
EDIT: The inevitable refactor (over a year) later.
"first lastname".split(" ").map do |str|
"#{str[0].upcase}#{str[1..str.length].downcase}"
end.join(' ')
while definitely not easier to read, it gets the same result while declaring fewer temporary variables.
I guess you could write something like:
string.capitalize unless string =~ /^[A-Z].*/
Personally I would just do string.capitalize
Unless you have a flag to be set for capitalized strings which you going to check than just capitalize without checking.
Also the capitalization itself is probably performing some checking.

Resources