Ruby Regex Replace Last Occurence of Grouping - ruby

In Ruby regular expressions I would like to use gsub to replace a last occurrence of a grouping, if it occurs, otherwise, perform a replacement anyways at a default location. I am trying to replace the last occurrence of a number in the 40s (40...49). I have the following regular expression, which is correctly capturing the grouping I would like in '\3':
/(([1-3,5-9][0-9]|([4][0-9]))[a-z])*Foo/
Some sample strings I am using this regex on are:
12a23b34c45d56eFoo
12a45b34c46d89eFoo
45aFoo
Foo
12a23bFoo
12a23b445cFoo
Using https://regex101.com/, I see the last number in 40s is captured in '\3'. I would then like to somehow perform string.gsub(regex, '\3' => 'NEW') to replace this last occurrence or append before Foo if not present. My desired results would be:
12a23b34cNEWd56eFoo
12a45b34cNEWd89eFoo
NEWaFoo
NEWFoo
12a23bNEWFoo
12a23b4NEWcFoo

If I correctly understood, you are interested in gsub with codeblock:
str.gsub(PATTERN) { |mtch|
puts mtch # the whole match
puts $~[3] # the third group
mtch.gsub($~[3], 'NEW') # the result
}
'abc'.gsub(/(b)(c)/) { |m| m.gsub($~[2], 'd') }
#⇒ "abd"
Probably you should handle the case when there are no 40-s occureneces at all, like:
gsub($~[1], "NEW$~[1]") if $~[3].nil?
To handle all the possible cases, one might declare the group for Foo:
# NOTE THE GROUP ⇓⇓⇓⇓⇓
▶ re = /(([1-3,5-9][0-9]|([4][0-9]))[a-z])*(Foo)/
#⇒ /(([1-3,5-9][0-9]|([4][0-9]))[a-z])*(Foo)/
▶ inp.gsub(re) do |mtch|
▷ $~[3].nil? ? mtch.gsub($~[4], "NEW#{$~[4]}") : mtch.gsub(/#{$~[3]}/, 'NEW')
▷ end
#⇒ "12a23b34cNEWd56eFoo\n12a45b34cNEWd89eFoo\nNEWaFoo\nNEWFoo\n12a23bNEWFoo"
Hope it helps.

I suggest the following:
'12a23b34c45d56eFoo'.gsub(/(([1-3,5-9][0-9]|([4][0-9]))[a-z])*Foo/) {
if Regexp.last_match[3].nil? then
puts "Append before Foo"
else
puts "Replace group 3"
end
}
You'd need to find a way to append or replace accordingly or maybe someone can edit with a concise code...

Related

Replace all words before the start of the first word (Regex and Ruby)

Here are my test cases.
Expected:
JUNKINFRONThttp://francium.tech should be http://francium.tech
JUNKINFRONThttp://francium.tech/http should be http://francium.tech/http
francium.tech/http should be francium.tech/http (unaffected)
Actual result:
http://francium.tech
francium.tech/http
http
I am trying to write a regex replace for this. I tried this,
text.sub(/.*http/,'http')
However, my second and third test cases fail because it searches till the end. It would help if the answer could also do the case insensitivity.
2.5.0 :001 > url = 'francium.tech/http'
=> "francium.tech/http"
2.5.0 :002 > url.sub(/^.*?(?=http)/i,'')
=> "http"
As per my original comments, you can use the pattern as shown below. If you want a really small performance gain, you can remove one step in the regex by using the second pattern instead. If you're especially concerned with performance, the last one performs even quicker.
^.*?(?=https?://)
^.*?(?=https?:/{2})
^.*?(?=ht{2}ps?:/{2})
See code in use here
strings = [
"JUNKINFRONThttp://francium.tech",
"JUNKINFRONThttp://francium.tech/http",
"francium.tech/http"
]
strings.each { |s| puts s.sub(%r{^.*?(?=https?://)}, '') }
Outputs the following:
http://francium.tech
http://francium.tech/http
francium.tech/http
I think this may solve your problem.
str1 = 'JUNKINFRONThttp://francium.tech'# should be http://francium.tech
str2 = 'JUNKINFRONThttp://francium.tech/http'# should be http://francium.tech/http
str3 = 'francium.tech/http' #should be francium.tech/http (unaffected)
str4 = 'JUNKINFRONThttps://francium.tech/http'# should be https://francium.tech/http
[str1, str2, str3, str4].each do |str|
puts str.gsub(/^.*(http|https):\/\//i, "\\1://")
end
Result:
http://francium.tech
http://francium.tech/http
francium.tech/http
https://francium.tech/http
When using regex you should make sure to use unique strings like http:\\ or better http:\\[SOMETHING].[AT_LEAST_TWO_CHARS][MAYBE_A_SLASH] and so on...
This works for your given cases:
str = ['JUNKINFRONThttp://francium.tech',
'JUNKINFRONThttp://francium.tech/http',
'francium.tech/http']
str.each do |str|
puts str.sub(/^.*?(https?:\/{2})/, '\1') # with capturing group
puts str.sub(/^.*?(?=https?:\/{2})/, '') # with positive lookahead
end
By using a group we can use it for the replacement, another method would be to use a positive lookahead

Stuck in Abbreviation implementation to ruby string

I want to convert all the words(alphabetic) in the string to their abbreviations like i18n does. In other words I want to change "extraordinary" into "e11y" because there are 11 characters between the first and the last letter in "extraordinary". It works with a single word in the string. But how can I do the same for a multi-word string? And of course if a word is <= 4 there is no point to make an abbreviation from it.
class Abbreviator
def self.abbreviate(x)
x.gsub(/\w+/, "#{x[0]}#{(x.length-2)}#{x[-1]}")
end
end
Test.assert_equals( Abbreviator.abbreviate("banana"), "b4a", Abbreviator.abbreviate("banana") )
Test.assert_equals( Abbreviator.abbreviate("double-barrel"), "d4e-b4l", Abbreviator.abbreviate("double-barrel") )
Test.assert_equals( Abbreviator.abbreviate("You, and I, should speak."), "You, and I, s4d s3k.", Abbreviator.abbreviate("You, and I, should speak.") )
Your mistake is that your second parameter is a substitution string operating on x (the original entire string) as a whole.
Instead of using the form of gsub where the second parameter is a substitution string, use the form of gsub where the second parameter is a block (listed, for example, third on this page). Now you are receiving each substring into your block and can operate on that substring individually.
def short_form(str)
str.gsub(/[[:alpha:]]{4,}/) { |s| "%s%d%s" % [s[0], s.size-2, s[-1]] }
end
The regex reads, "match four or more alphabetic characters".
short_form "abc" # => "abc"
short_form "a-b-c" #=> "a-b-c"
short_form "cats" #=> "c2s"
short_form "two-ponies-c" #=> "two-p4s-c"
short_form "Humpty-Dumpty, who sat on a wall, fell over"
#=> "H4y-D4y, who sat on a w2l, f2l o2r"
I would recommend something along the lines of this:
class Abbreviator
def self.abbreviate(x)
x.gsub(/\w+/) do |word|
# Skip the word unless it's long enough
next word unless word.length > 4
# Do the same I18n conversion you do before
"#{word[0]}#{(word.length-2)}#{word[-1]}"
end
end
end
The accepted answer isn't bad, but it can be made a lot simpler by not matching words that are too short in the first place:
def abbreviate(str)
str.gsub(/([[:alpha:]])([[:alpha:]]{3,})([[:alpha:]])/i) { "#{$1}#{$2.size}#{$3}" }
end
abbreviate("You, and I, should speak.")
# => "You, and I, s4d s3k."
Alternatively, we can use lookbehind and lookahead, which makes the Regexp more complex but the substitution simpler:
def abbreviate(str)
str.gsub(/(?<=[[:alpha:]])[[:alpha:]]{3,}(?=[[:alpha:]])/i, &:size)
end

regex for a pattern at end of string

I have a string which looks like:
hello/world/1.9.2-some-text
hello/world/2.0.2-some-text
hello/world/2.11.0
Through regex I want to get the string after last '/' and until end of line i.e. in above examples output should be 1.9.2-some-text, 2.0.2-some-text, 2.11.0
I tried this - ^(.+)\/(.+)$ which returns me an array of which first object is "hello/world" and 2nd object is "1.9.2-some-text"
Is there a way to just get "1.9.2-some-text" as the output?
Try using a negative character class ([^…]) like this:
[^\/]+$
This will match one or more of any character other than / followed by the end of the string.
You can use a negated match here.
'hello/world/1.9.2-some-text'.match(Regexp.new('[^/]+$'))
# => "1.9.2-some-text"
Meaning any character except: / (1 or more times) followed by the end of the string.
Although, the simplest way would be to split the string.
'hello/world/1.9.2-some-text'.split('/').last
# => "1.9.2-some-text"
OR
'hello/world/1.9.2-some-text'.split('/')[-1]
# => "1.9.2-some-text"
If you do not need to use a regex, the ordinary way of doing such thing is:
File.basename("hello/world/1.9.2-some-text")
#=> "1.9.2-some-text"
This is one way:
s = 'hello/world/1.9.2-some-text
hello/world/2.0.2-some-text
hello/world/2.11.0'
s.lines.map { |l| l[/.*\/(.*)/,1] }
#=> ["1.9.2-some-text", "2.0.2-some-text", "2.11.0"]
You said, "in above examples output should be 1.9.2-some-text, 2.0.2-some-text, 2.11.0". That's neither a string nor an array, so I assumed you wanted an array. If you want a string, tack .join(', ') onto the end.
Regex's are naturally "greedy", so .*\/ will match all characters up to and including the last / in each line. 1 returns the contents of the capture group (.*) (capture group 1).

Ruby: Use condition result in condition block

I have such code
reg = /(.+)_path/
if reg.match('home_path')
puts reg.match('home_path')[0]
end
This will eval regex twice :(
So...
reg = /(.+)_path/
result = reg.match('home_path')
if result
puts result[0]
end
But it will store variable result in memory till.
I have one functional-programming idea
/(.+)_path/.match('home_path').compact.each do |match|
puts match[0]
end
But seems there should be better solution, isn't it?
There are special global variables (their names start with $) that contain results of the last regexp match:
r = /(.+)_path/
# $1 - the n-th group of the last successful match (may be > 1)
puts $1 if r.match('home_path')
# => home
# $& - the string matched by the last successful match
puts $& if r.match('home_path')
# => home_path
You can find full list of predefined global variables here.
Note, that in the examples above puts won't be executed at all if you pass a string that doesn't match the regexp.
And speaking about general case you can always put assignment into condition itself:
if m = /(.+)_path/.match('home_path')
puts m[0]
end
Though, many people don't like that as it makes code less readable and gives a good opportunity for confusing = and ==.
My personal favorite (w/ 1.9+) is some variation of:
if /(?<prefix>.+)_path/ =~ "home_path"
puts prefix
end
If you really want a one-liner: puts /(?<prefix>.+)_path/ =~ 'home_path' ? prefix : false
See the Ruby Docs for a few limitations of named captures and #=~.
From the docs: If a block is given, invoke the block with MatchData if match succeed.
So:
/(.+)_path/.match('home_path') { |m| puts m[1] } # => home
/(.+)_path/.match('homepath') { |m| puts m[1] } # prints nothing
How about...
if m=/regex here/.match(string) then puts m[0] end
A neat one-line solution, I guess :)
how about this ?
puts $~ if /regex/.match("string")
$~ is a special variable that stores the last regexp match. more info: http://www.regular-expressions.info/ruby.html
Actually, this can be done with no conditionals at all. (The expression evaluates to "" if there is no match.)
puts /(.+)_path/.match('home_xath').to_a[0].to_s

Capture arbitrary string before either '/' or end of string

Suppose I have:
foo/fhqwhgads
foo/fhqwhgadshgnsdhjsdbkhsdabkfabkveybvf/bar
And I want to replace everything that follows 'foo/' up until I either reach '/' or, if '/' is never reached, then up to the end of the line. For the first part I can use a non-capturing group like this:
(?<=foo\/).+
And that's where I get stuck. I could match to the second '/' like this:
(?<=foo\/).+(?=\/)
That doesn't help for the first case though. Desired output is:
foo/blah
foo/blah/bar
I'm using Ruby.
Try this regex:
/(?<=foo\/)[^\/]+/
Implementing #Endophage's answer:
def fix_post_foo_portion(string)
portions = string.split("/")
index_to_replace = portions.index("foo") + 1
portions[index_to_replace ] = "blah"
portions.join("/")
end
strings = %w{foo/fhqwhgads foo/fhqwhgadshgnsdhjsdbkhsdabkfabkveybvf/bar}
strings.each {|string| puts fix_post_foo_portion(string)}
I'm not a ruby dev but is there some equivalent of php's explode() so you could explode the string, insert a new item at the second array index then implode the parts with / again... Of course you can match on the first array element if you only want to do the switch in certain cases.
['foo/fhqwhgads', 'foo/fhqwhgadshgnsdhjsdbkhsdabkfabkveybvf/bar'].each do |s|
puts s.sub(%r|^(foo/)[^/]+(/.*)?|, '\1blah\2')
end
Output:
foo/blah
foo/blah/bar
I'm too tired to think of a nicer way to do it but I'm sure there is one.
Checking for the end-of-string anchor -- $ -- as well as the / character should do the trick. You'll also need to make the .+ non-greedy by changing it to .+? since the greedy version will always match right up to the end of the string, given the chance.
(?<=foo\/).+?(?=\/|$)

Resources