How to eliminate multiple instances of a substring from a string - ruby

I got a string
str = "<strong><strong>HI</strong></strong>".
I want to eliminate duplicate instances of <strong> and </strong> to get <strong>HI</strong>.
There could potentially be more than two instances.
I tried to use squeeze('<strong>'), but it doesn't seem to do the job.

you could do this:
str = "<strong><strong>HI</strong></strong>"
str.gsub('<strong><strong>', '<strong>').
gsub('</strong></strong>', '</strong>') # => "<strong>HI</strong>"

Other option would be to use regex to get the desired result.
str.match(/<[a-z]+>[a-zA-Z]+<[\/a-z]+>/)
=> #<MatchData "<strong>HI</strong>">

you can do something like this
str = "<strong><strong>HI</strong></strong>"
p str[/<.*?>/]+ str[/>([^<]*)<\//,1]+str[/<\/.*?>/]

This can be done multiple ways, one of which is using gsub, like this:
str
.gsub(/(<strong>)+/, '<strong>')
.gsub(/(<\/strong>)+/, '</strong>')
The first gsub replaces one or more consecutive instances of <strong> with one <strong>, and the second gsub replaces one or more consecutive instances of </strong> with one </strong> (note the / in the regular expression part needs to be escaped).
It's worth noting that this approach does not ensure there are a matched number of opening and closing <strong> tags.

Related

How to use gsubstitution with more letters

I've printed the code, wit ruby
string = "hahahah"
pring string.gsub("a","b")
How do I add more letter replacements into gsub?
string.gsub("a","b")("h","l") and string.gsub("a","b";"h","l")
didnt work...
*update I have tried this too but without any success .
letters = {
"a" => "l"
"b" => "n"
...
"z" => "f"
}
string = "hahahah"
print string.gsub(\/w\,letters)
You're overcomplicating. As with most method calls in Ruby, you can simply chain #gsub calls together, one after the other:
str = 'adfh'
print str.gsub("a","b").gsub("h","l") #=> 'bdfl'
What you're doing here is applying the second #gsub to the result of the first one.
Of course, that gets a bit long-winded if you do too many of them. So, when you find yourself stringing too many together, you'll want to look for a regex solution. Rubular is a great place to tinker with them.
The way to use your hash trick with #gsub and a regex expression is to provide a hash for all possible matches. This has the same result as the two #gsub calls:
print str.gsub(/[ah]/, {'a'=>'b', 'h'=>'l'}) #=> 'bdfl'
The regex matches either a or h (/[ah]/), and the hash is saying what to substitute for each of them.
All that said, str.tr('ah', 'bl') is the simplest way to solve your problem as specified, as some commenters have mentioned, so long as you are working with single letters. If you need to work with two or more characters per substitution, you'll need to use #gsub.

Capturing groups don't work as expected with Ruby scan method

I need to get an array of floats (both positive and negative) from the multiline string. E.g.: -45.124, 1124.325 etc
Here's what I do:
text.scan(/(\+|\-)?\d+(\.\d+)?/)
Although it works fine on regex101 (capturing group 0 matches everything I need), it doesn't work in Ruby code.
Any ideas why it's happening and how I can improve that?
See scan documentation:
If the pattern contains no groups, each individual result consists of the matched string, $&. If the pattern contains groups, each individual result is itself an array containing one entry per group.
You should remove capturing groups (if they are redundant), or make them non-capturing (if you just need to group a sequence of patterns to be able to quantify them), or use extra code/group in case a capturing group cannot be avoided.
In this scenario, the capturing group is used to quantifiy a pattern sequence, thus all you need to do is convert the capturing group into a non-capturing one by replacing all unescaped ( with (?: (there is only one occurrence here):
text = " -45.124, 1124.325"
puts text.scan(/[+-]?\d+(?:\.\d+)?/)
See demo, output:
-45.124
1124.325
Well, if you need to also match floats like .04 you can use [+-]?\d*\.?\d+. See another demo
There are cases when you cannot get rid of a capturing group, e.g. when the regex contains a backreference to a capturing group. In that case, you may either a) declare a variable to store all matches and collect them all inside a scan block, or b) enclose the whole pattern with another capturing group and map the results to get the first item from each match, c) you may use a gsub with just a regex as a single argument to return an Enumerator, with .to_a to get the array of matches:
text = "11234566666678"
# Variant a:
results = []
text.scan(/(\d)\1+/) { results << Regexp.last_match(0) }
p results # => ["11", "666666"]
# Variant b:
p text.scan(/((\d)\2+)/).map(&:first) # => ["11", "666666"]
# Variant c:
p text.gsub(/(\d)\1+/).to_a # => ["11", "666666"]
See this Ruby demo.
([+-]?\d+\.\d+)
assumes there is a leading digit before the decimal point
see demo at Rubular
If you need capture groups for a complex pattern match, but want the entire expression returned by .scan, this can work for you.
Suppose you want to get the image urls in this string perhaps from a markdown text with html image tags:
str = %(
Before
<img src="https://images.zenhubusercontent.com/11223344e051aa2c30577d9d17/110459e6-915b-47cd-9d2c-1842z4b73d71">
After
<img src="https://user-images.githubusercontent.com/111222333/75255445-f59fb800-57af-11ea-9b7a-a235b84bf150.png">).strip
You may have a regular expression defined to match just the urls, and maybe used a Rubular example like this to build/test your Regexp
image_regex =
/https\:\/\/(user-)?images.(githubusercontent|zenhubusercontent).com.*\b/
Now you don't need each sub-capture group, but just the the entire expression in your your .scan, you can just wrap the whole pattern inside a capture group and use it like this:
image_regex =
/(https\:\/\/(user-)?images.(githubusercontent|zenhubusercontent).com.*\b)/
str.scan(image_regex).map(&:first)
=> ["https://user-images.githubusercontent.com/1949900/75255445-f59fb800-57af-11ea-9b7a-e075f55bf150.png",
"https://user-images.githubusercontent.com/1949900/75255473-02bca700-57b0-11ea-852a-58424698cfb0.png"]
How does this actually work?
Since you have 3 capture groups, .scan alone will return an Array of arrays with, one for each capture:
str.scan(image_regex)
=> [["https://user-images.githubusercontent.com/111222333/75255445-f59fb800-57af-11ea-9b7a-e075f55bf150.png", "user-", "githubusercontent"],
["https://images.zenhubusercontent.com/11223344e051aa2c30577d9d17/110459e6-915b-47cd-9d2c-0714c8f76f68", nil, "zenhubusercontent"]]
Since we only want the 1st (outter) capture group, we can just call .map(&:first)

Replace/strip empty lines using gsub

I have an HTML page:
<strong>
Product Name:
</strong>
I want to strip its empty lines (^\n or ^$). Expected HTML is:
<strong>
Product Name:
</strong>
Here is my syntax:
r.gsub!(/^\\n/, '')
It doesn't seem to work. I tried many combinations and I can't get it to do anything. puts r.class => string and r always have spaces in them. I'm actually trying a larger set of reductions:
r.gsub!(/\\n\s+?/, '').gsub!(/\\t\s+?/, '').gsub!(/^\\n/, '')
The problem seems to be that you are escaping backslashes when you shouldn't be. E.g. /\\n/ will match the string \n, not a newline character. /\n/ will match a newline character. Same goes for \t.
If you want to play around with Ruby regular expressions, I recommend checking out Rubular.
Also, be careful with gsub!, especially chaining them like that. gsub! returns nil if nothing is replaced and you will get an undefined method for nil error on subsequent calls. You're much better off with
r = r.gsub(...).gsub(...) ...
I got it to work.
r = r.gsub(/\t\s+?/, "")
r = r.gsub(/^\s*$/, "")
The "\n" can be encapsulated by \s*. $ does not mean \n.

ruby regex to match multiple occurrences of pattern

I am looking to build a ruby regex to match multiple occurrences of a pattern and return them in an array. The pattern is simply: [[.+]]. That is, two left brackets, one or more characters, followed by two right brackets.
This is what I have done:
str = "Some random text[[lead:first_name]] and more stuff [[client:last_name]]"
str.match(/\[\[(.+)\]\]/).captures
The regex above doesn't work because it returns this:
["lead:first_name]] and another [[client:last_name"]
When what I wanted was this:
["lead:first_name", "client:last_name"]
I thought if I used a noncapturing group that for sure it should solve the issue:
str.match(/(?:\[\[(.+)\]\])+/).captures
But the noncapturing group returns the same exact wrong output. Any idea on how I can resolve my issue?
The problem with your regex is that the .+ part is "greedy", meaning that if the regex matches both a smaller and larger part of the string, it will capture the larger part (more about greedy regexes).
In Ruby (and most regex syntaxes), you can qualify your + quantifier with a ? to make it non-greedy. So your regex would become /(?:\[\[(.+?)\]\])+/.
However, you'll notice this still doesn't work for what you want to do. The Ruby capture groups just don't work inside a repeating group. For your problem, you'll need to use scan:
"[[a]][[ab]][[abc]]".scan(/\[\[(.+?)\]\]/).flatten
=> ["a", "ab", "abc"]
Try this:
=> str.match(/\[\[(.*)\]\].*\[\[(.*)\]\]/).captures
=> ["lead:first_name", "client:last_name"]
With many occurrences:
=> str
=> "Some [[lead:first_name]] random text[[lead:first_name]] and more [[lead:first_name]] stuff [[client:last_name]]"
=> str.scan(/\[(\w+:\w+)\]/)
=> [["lead:first_name"], ["lead:first_name"], ["lead:first_name"], ["client:last_name"]]

String gsub - Replace characters between two elements, but leave surrounding elements

Suppose I have the following string:
mystring = "start/abc123/end"
How can you splice out the abc123 with something else, while leaving the "/start/" and "/end" elements intact?
I had the following to match for the pattern, but it replaces the entire string. I was hoping to just have it replace the abc123 with 123abc.
mystring.gsub(/start\/(.*)\/end/,"123abc") #=> "123abc"
Edit: The characters between the start & end elements can be any combination of alphanumeric characters, I changed my example to reflect this.
You can do it using this character class : [^\/] (all that is not a slash) and lookarounds
mystring.gsub(/(?<=start\/)[^\/]+(?=\/end)/,"7")
For your example, you could perhaps use:
mystring.gsub(/\/(.*?)\//,"/7/")
This will match the two slashes between the string you're replacing and putting them back in the substitution.
Alternatively, you could capture the pieces of the string you want to keep and interpolate them around your replacement, this turns out to be much more readable than lookaheads/lookbehinds:
irb(main):010:0> mystring.gsub(/(start)\/.*\/(end)/, "\\1/7/\\2")
=> "start/7/end"
\\1 and \\2 here refer to the numbered captures inside of your regular expression.
The problem is that you're replacing the entire matched string, "start/8/end", with "7". You need to include the matched characters you want to persist:
mystring.gsub(/start\/(.*)\/end/, "start/7/end")
Alternatively, just match the digits:
mystring.gsub(/\d+/, "7")
You can do this by grouping the start and end elements in the regular expression and then referring to these groups in in the substitution string:
mystring.gsub(/(?<start>start\/).*(?<end>\/end)/, "\\<start>7\\<end>")

Resources